Trees for Photo-Z » History » Version 1

Version 1/17 - Next » - Current version
Edward Kim, 01/10/2014 11:46 AM

Trees for Photo-Z

TPZ [1] is a supervised machine learning, parallel algorithm that uses prediction
trees and random forest techniques to produce both robust photometric redshift
PDFs and ancillary information for a galaxy sample. A prediction tree is built
by asking a sequence of questions that recursively split the input data taken
from the spectroscopic sample, frequently into two branches, until a terminal
leaf is created that meets a stopping criterion (e.g., a minimum leaf size or
a variance threshold). The dimension in which the data is divided is chosen
to be the one with highest information gain among the random subsample of
dimensions obtained at every point. This process produces less correlated trees
and allows to explore several configurations within the data. The small region
bounding the data in the terminal leaf node represents a specific subsample of
the entire data with similar properties. Within this leaf, a model is applied that
provides a fairly comprehensible prediction, especially in situations where many
variables may exist that interact in a nonlinear manner as is often the case with
photo-z estimation.