Project

General

Profile

Trees for Photo-Z » History » Version 8

Edward Kim, 01/10/2014 12:39 PM

1 1 Edward Kim
h1. Trees for Photo-Z
2 1 Edward Kim
3 8 Edward Kim
Under construction.
4 8 Edward Kim
5 4 Edward Kim
h2. Introduction
6 4 Edward Kim
7 5 Edward Kim
!{width:300px}example_tree.png!
8 5 Edward Kim
9 3 Edward Kim
"TPZ":http://lcdm.astro.illinois.edu/research/papers/tpz.html is a supervised machine learning, parallel algorithm that uses prediction trees and random forest techniques to produce both robust photometric redshift PDFs and ancillary information for a galaxy sample. A prediction tree is built by asking a sequence of questions that recursively split the input data taken from the spectroscopic sample, frequently into two branches, until a terminal leaf is created that meets a stopping criterion (e.g., a minimum leaf size or a variance threshold). The dimension in which the data is divided is chosen to be the one with highest information gain among the random subsample of dimensions obtained at every point. This process produces less correlated trees and allows to explore several configurations within the data. The small region bounding the data in the terminal leaf node represents a specific subsample of the entire data with similar properties. Within this leaf, a model is applied that provides a fairly comprehensible prediction, especially in situations where many variables may exist that interact in a nonlinear manner as is often the case with photo-z estimation.
10 4 Edward Kim
11 4 Edward Kim
h2. Initial Test
12 4 Edward Kim
13 4 Edward Kim
In this initial test, we illustrate the capabilities of TPZ by using the following set of attributes:
14 4 Edward Kim
15 6 Edward Kim
* mag_model in g, r, i, z, y bands
16 4 Edward Kim
17 6 Edward Kim
* mag_psf in g, r, i, z, y bands
18 4 Edward Kim
19 7 Edward Kim
and their respective errors. For training, we require that mag_model and mag_psf be less than 99. We build 100 trees by using 10 different sets of 4 random attributes, each with 10 different random attribute perturbations. The 100 trees votes to create a probabilistic classfication—if 96 trees vote galaxy and the remaining 4 vote star, we have a galaxy at 96% probability.