This R
package finds robust subgroups in data with a single continuous response, suitable for either regression or treatment effect models. Subgroups are identified via recursive partitioning, resulting in an interpretable tree. Conformal prediction methods (SCR, CV+ and Jackknife+) are leveraged to simultaneously optimize inter-group heterogeneity and intra-group homogeneity. First, predictions are made using an arbitrary regression learner from the 100+ algorithms available in tidymodels
. Then, the data is split recursively using the robust conformal criterion. In this way, conftree
is an extension the R2P algorithm from Lee et al. (NeurIPS 2020).
Scope:
Installation
You can install the current development version from GitHub with:
if (!require("remotes")) {
install.packages("remotes")
}
remotes::install_github("holgstr/conftree")
Quickstart
Let’s find subgroups in the Washington bike share data. We use a random forest from tidymodels
as learner
, a 5% miscoverage rate as alpha
, and 10 cv_folds
for the CV+ to quantify the uncertainty in the resulting subgroups:
library(conftree)
library(tidymodels)
data(bikes)
set.seed(1234)
# Specify the learner to be used for model training:
forest <- rand_forest() %>%
set_mode("regression") %>%
set_engine("ranger")
# Find optimal subgroups:
groups <- r2p(
data = bikes,
target = "count",
learner = forest,
cv_folds = 10,
alpha = 0.05,
gamma = 0.2,
lambda = 0.5,
max_groups = 4
)
# Display tree structure:
groups$tree
#> [1] root
#> | [2] weekday in Sun: *
#> | [3] weekday in Mon, Tue, Wed, Thu, Fri, Sat
#> | | [4] weekday in Sat: *
#> | | [5] weekday in Sun, Mon, Tue, Wed, Thu, Fri
#> | | | [6] temp <= 6.15: *
#> | | | [7] temp > 6.15: *
# Plot:
plot(groups)