Skip to contents

This R package finds subgroups in data with a single continuous response, suitable for either regression or treatment effect problems. Subgroups are identified via recursive partitioning, resulting in an interpretable tree. Conformal prediction methods (SCR, CV+ and Jackknife+) are leveraged to find homogeneous subgroups through robust uncertainty quantification. First, predictions are made using an arbitrary regression learner from the 100+ algorithms available in tidymodels. Then, the data is split recursively using the robust conformal criterion. The conftree package implements and extends the theory from Lee et al. (NeurIPS 2020).

Scope:

  • Subgroups for regression problems with r2p()
  • Subgroups for treatment effects with r2p_hte()

Installation

You can install the current development version from GitHub with:

if (!require("remotes")) {
  install.packages("remotes")
}
remotes::install_github("holgstr/conftree")

Quickstart

Let’s find subgroups in the Washington bike share data. We use a random forest from tidymodels as learner, a 5% miscoverage rate as alpha, and 10 cv_folds for the CV+ to quantify the uncertainty in subgroups:

library(conftree)
library(tidymodels)
data(bikes)

set.seed(1234)

# Specify the learner to be used for model training:
forest <- rand_forest() %>%
  set_mode("regression") %>%
  set_engine("ranger")

# Find optimal subgroups:
groups <- r2p(
  data = bikes,
  target = "count",
  learner = forest,
  cv_folds = 10,
  alpha = 0.05,
  gamma = 0.01,
  lambda = 0.5,
  max_groups = 3
)

# Display tree structure:
groups$tree
#> [1] root
#> |   [2] weekday in Sun, Sat: *
#> |   [3] weekday in Mon, Tue, Wed, Thu, Fri
#> |   |   [4] weather in rain: *
#> |   |   [5] weather in clear, misty: *

# Plot:
plot(groups)