Trees for Photo-Z » History » Version 1

Edward Kim, 01/10/2014 11:46 AM

1 1 Edward Kim
h1. Trees for Photo-Z
2 1 Edward Kim
3 1 Edward Kim
TPZ [1] is a supervised machine learning, parallel algorithm that uses prediction
4 1 Edward Kim
trees and random forest techniques to produce both robust photometric redshift
5 1 Edward Kim
PDFs and ancillary information for a galaxy sample. A prediction tree is built
6 1 Edward Kim
by asking a sequence of questions that recursively split the input data taken
7 1 Edward Kim
from the spectroscopic sample, frequently into two branches, until a terminal
8 1 Edward Kim
leaf is created that meets a stopping criterion (e.g., a minimum leaf size or
9 1 Edward Kim
a variance threshold). The dimension in which the data is divided is chosen
10 1 Edward Kim
to be the one with highest information gain among the random subsample of
11 1 Edward Kim
dimensions obtained at every point. This process produces less correlated trees
12 1 Edward Kim
and allows to explore several configurations within the data. The small region
13 1 Edward Kim
bounding the data in the terminal leaf node represents a specific subsample of
14 1 Edward Kim
the entire data with similar properties. Within this leaf, a model is applied that
15 1 Edward Kim
provides a fairly comprehensible prediction, especially in situations where many
16 1 Edward Kim
variables may exist that interact in a nonlinear manner as is often the case with
17 1 Edward Kim
photo-z estimation.