Builds a regression tree for longitudinal or functional data using the spline projection method. The underlying tree building process uses the rpart package, and the resulting spline tree is an rpart object with additional stored information. The parameters df, knots, degree, intercept allow for flexibility in customizing the spline basis used for projection. The parameters nGrid and gridPoints allow for flexibility in the grid on which the projection sum of squares is evaluated. The parameters minNodeSize and cp allow for flexibility in controlling the size of the final tree.

splineTree(splitFormula, tformula, idvar, data, knots = NULL,
  df = NULL, degree = 3, intercept = FALSE, nGrid = 7,
  gridPoints = NULL, minNodeSize = 10, cp = 0.01)

Arguments

splitFormula

Formula specifying the longitudinal response variable and the time-constant variables that will be used for splitting in the tree.

tformula

Formula specifying the longitudinal response variable and the variable that acts as the time variable.

idvar

The name of the variable that serves as the ID variable for grouping observations. Must be a string.

data

dataframe in long format that contains all variables specified in the formulas.

knots

Specified locations for internal knots in the spline basis. Defaults to NULL, which corresponds to no internal knots.

df

Degrees of freedom of the spline basis. If this is specified but the knots parameter is NULL, then the appropriate number of internal knots will be added at quantiles of the training data. If both df and knots are unspecified, the spline basis will have no internal knots. If knots is specified, this parameter will be ignored.

degree

Specifies degree of spline basis used for projection.

intercept

Specifies whether or not the set of basis functions will include the intercept function. Defaults to FALSE, which means that the tree will split based on trajectory shape, ignoring response level.

nGrid

Number of grid points to evaluate projection sum of squares at. If gridPoints is not supplied, this argument will be used and the appropriate number of grid points will be placed at equally spaced quantiles of the time variable. The default is 7.

gridPoints

Optional. A vector of numbers that will be used as the grid on which to evaluate the projection sum of squares. Should fall roughly within the range of the time variable.

minNodeSize

Minimum number of observational units that can be in a terminal node. Controls tree size and helps avoid overfitting. Defaults to 10.

cp

Complexity parameter passed to the rpart building process. Controls tree size. Defaults to the rpart default of 0.01.

Value

An rpart object with additional splinetree-specific information stored in model$parms. The important attributes of the rpart object include model$frame, model$where, and model$cptable. model$frame holds information about each node in the tree. The ith entry in model$where tells us which row of model$frame describes the node that the ith individual in the flattened dataset falls into. model$parms$flat_data holds the flattened dataset that was used to build the tree. model$cptable displays the complexity parameters that would be needed to prune the tree to various desired sizes. Apart from holding the flattened dataset, model$parms holds the boundary knots and the internal knots of the spline basis used to build the tree. These are sometimes important to recover later.

Examples

nlsySample_subset <- nlsySample[nlsySample$ID %in% sample(unique(nlsySample$ID), 500),] splitForm <- ~HISP+WHITE+BLACK+HGC_MOTHER+HGC_FATHER+SEX+Num_sibs tree1 <- splineTree(splitForm, BMI~AGE, 'ID', nlsySample_subset, degree=3, intercept=TRUE, cp=0.005) stPrint(tree1)
#> n= 500, #> #> node), split, n , coefficients #> * denotes terminal node #> #> 1) root, 500, (21.46013, 4.0552900, 7.153639, 7.341742) #> 2) WHITE< 0.5, 242, (21.03874, 5.6282150, 8.895810, 8.144596) #> 4) HGC_FATHER< 0.5, 15, (20.43773, 14.3085200, 7.333401, 11.883230)* #> 5) HGC_FATHER>=0.5, 227, (21.07846, 5.0546260, 8.999053, 7.897551)* #> 3) WHITE>=0.5, 258, (21.85539, 2.5799100, 5.519510, 6.588677) #> 6) SEX< 1.5, 137, (22.73525, 3.3813550, 5.517115, 7.342290) #> 12) HGC_MOTHER< 13.5, 108, (22.91297, 4.0432710, 5.609483, 7.528207)* #> 13) HGC_MOTHER>=13.5, 29, (22.07340, 0.9162879, 5.173127, 6.649911)* #> 7) SEX>=1.5, 121, (20.85918, 1.6724900, 5.522222, 5.735412) #> 14) HGC_MOTHER< 14.5, 106, (21.09507, 1.5462340, 6.043849, 5.757517) #> 28) HGC_FATHER< 15.5, 92, (21.10598, 1.0854990, 5.628914, 5.910522) #> 56) HGC_MOTHER< 11.5, 30, (21.75568, 1.6616190, 6.913787, 6.584619)* #> 57) HGC_MOTHER>=11.5, 62, (20.79161, 0.8067319, 5.007201, 5.584346)* #> 29) HGC_FATHER>=15.5, 14, (21.02338, 4.5739200, 8.770563, 4.752059)* #> 15) HGC_MOTHER>=14.5, 15, (19.19220, 2.5646940, 1.836060, 5.579204)*
stPlot(tree1)