What a regression tree actually returns as output is the mean value of the dependent variable (here Y) of the training samples that end up in the respective terminal nodes (leaves); these mean values are shown as lists named value
in the picture, which are all of length 10 here, since your Y is 10-dimensional.
In other words, and using the leftmost terminal node (leaf) of your tree as an example:
- The leaf consists of the 42 samples for which
X[0] <= 0.675
and X[1] <= 0.5
- The mean value of your 10-dimensional output for these 42 samples is given in the
value
list of this leave, which is of length 10 indeed, i.e. the mean of Y[0]
is -152007.382
, the mean of Y[1]
is -206040.675
etc and the mean of Y[9]
is 3211.487
.
You can confirm that this is the case by predicting some samples (from your training or test set - it doesn't matter) and checking that your 10-dimensional result is one of the 4 value
lists depicted in the terminal leaves above.
Additionally, you can confirm that, for each element in value
, the weighted averages of the children nodes are equal to the respective element of the parent node. Again, using the first element of your 2 leftmost terminal nodes (leaves), we get:
(-42*152007.382 - 56*199028.147)/98
# -178876.39057142858
i.e. the value[0]
element of their parent node (the leftmost node in the intermediate level). One more example, this time for the first value
elements of your 2 intermediate nodes:
(-98*178876.391 + 42*417378.245)/140
# -0.00020000000617333822
which again agrees with the -0.0
first value
element of your root node.
Judging from the value
list of your root node, it seems that the mean values of all elements of your 10-dimensional Y are almost zero, which you can (and should) verify manually, as a final confirmation.
So, to wrap-up:
- The
value
list of each node contains the mean Y values for the training samples "belonging" to the respective node
- Additionally, for the terminal nodes (leaves), these lists are the actual outputs of the tree model (i.e. the output will always be one of these lists, depending on X)
- For the root node, the
value
list contains the mean Y values for the whole of your training dataset
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…