SG separation challenge » History » Version 33

Ignacio Sevilla, 12/18/2014 02:36 PM

1 1 Ignacio Sevilla
h1. SG separation challenge
2 1 Ignacio Sevilla
3 29 Ignacio Sevilla
Purpose and status as of July 20th 2014:
4 29 Ignacio Sevilla
5 30 Ignacio Sevilla
Rounds 1 and 2 have served to verify that we could do better than standard DESDM classifiers, at least in the fields we have trained on. We are moving beyond these fields, in particular to SPTE, and applied some tests to understand the quality of the classification without truth values. We have found some puzzling behavior, specially for the stars. Before providing these catalogs to the collaboration, we have to understand these features. So round 3 will center on having calibration fields closer to DES survey characteristics, larger spectroscopic samples, including more stars, as well having an eye on the particular observing conditions of the training fields and check whether those regions in SPTE with similar conditions have an expected behavior. The goal is to provide a classifier(s) in the short term that is well backed up by plots/results from this challenge, showing their behavior. 
6 30 Ignacio Sevilla
7 29 Ignacio Sevilla
8 28 Ignacio Sevilla
*[[des-sci-verification:SG_separation_challenge_details|Details and results]]*
9 28 Ignacio Sevilla
10 1 Ignacio Sevilla
Now that several people are testing their own approaches:
11 1 Ignacio Sevilla
12 19 William Wester
* Cut-based with DESDM info (Eli, Diego, Nacho, Ryan, William...).
13 1 Ignacio Sevilla
* Multi-class (Maayane)
14 1 Ignacio Sevilla
* Random Forests (Ryan)
15 20 Alex Drlica-Wagner
* Boosted Decision Trees (Nacho, Alex)
16 1 Ignacio Sevilla
* Alternative Neural Network with probabilistic output (Chris Bonnett).
17 13 Basilio Santiago
* Probability based on spread model and photometry (DES-Brazil)
18 14 Basilio Santiago
* Others...
19 1 Ignacio Sevilla
20 1 Ignacio Sevilla
I think the time is right and the codes are mature to launch a specific SG separation challenge, mimicking the successful photo-z WG exercise.
21 1 Ignacio Sevilla
22 1 Ignacio Sevilla
We have to establish:
23 1 Ignacio Sevilla
24 1 Ignacio Sevilla
* The training/validation/testing sample (COSMOS, others).
25 17 Ignacio Sevilla
I have prepared a 70/30 training/testing with the deep COSMOS field matched to ACS imaging. About 280 parameters, up to each tester to choose which.
26 17 Ignacio Sevilla
Besides new datasets, also consider shallower COSMOS. Also consider fixed set of parameters as Eduardo suggests. Also need to add SLR corrections though I think not very important now.
27 6 Ignacio Sevilla
* Only stars and galaxies? What about QSOs, image artifacts?
28 17 Ignacio Sevilla
Star/galaxy for round 1.
29 3 Ignacio Sevilla
* The metrics (Fixed cut, Fixed purity, Fixed Efficiency, ROC -- see example below).
30 21 Ignacio Sevilla
I would prefer to use ROC, i.e., True Positive Rate vs False Positive Rate curve formed changing the threshold (thanks Alex for pointing out mistake in previous ROC!).
31 7 Eli Rykoff
* SVA1 systematics: correlations with depth, Galactic latitude, seeing, etc.
32 4 Ignacio Sevilla
* Who/how to run it.
33 17 Ignacio Sevilla
I suggest each group providing an output file with id (or ra,dec on first round) plus galaxy probability or binary value.
34 9 Maayane Soumagnac
* Is there any gain combining them (a committee)?
35 27 Alex Drlica-Wagner
* The schedule.
36 18 Ignacio Sevilla
37 22 Alex Drlica-Wagner
h1. Comparison metrics
38 9 Maayane Soumagnac
39 25 Alex Drlica-Wagner
There are a number of metrics that can be used for comparing the performance of classifiers. Some especially useful metrics are those defined in the DES star/galaxy separation (on simulation) paper "arXiv:1306.5236": and the "receiver operating characteristic (ROC)": generally used for classifier comparison.
40 9 Maayane Soumagnac
41 10 Maayane Soumagnac
h2. Completeness and Purity provided by a given classifier
42 10 Maayane Soumagnac
43 22 Alex Drlica-Wagner
We define the parameters used to quantify the quality of a star/galaxy classifier. For a given class of objects, X (stars or galaxies), we distinguish the surface density of properly classified objects, N_X , and the misclassified objects, M_X .
44 9 Maayane Soumagnac
45 9 Maayane Soumagnac
* The galaxy completeness c^g is defined as the ratio of the number of true galaxies classified as galaxies to the total number of true galaxies. 
46 22 Alex Drlica-Wagner
* The stellar contamination f_s is defined as the ratio of stars classified as galaxies to the total amount of objects classified as galaxies. 
47 11 Maayane Soumagnac
* The purity p^g is defined as 1-f_s
48 10 Maayane Soumagnac
49 10 Maayane Soumagnac
50 10 Maayane Soumagnac
51 10 Maayane Soumagnac
Bellow are three different plots we suggest to use to assess the performances of each classifier.
52 10 Maayane Soumagnac
53 10 Maayane Soumagnac
h3. Histograms
54 10 Maayane Soumagnac
55 10 Maayane Soumagnac
Example, on simulations, from arXiv:1306.5236
56 11 Maayane Soumagnac
57 10 Maayane Soumagnac
58 10 Maayane Soumagnac
h3. purity as a function of magnitude (for fixed completeness, the threshold/cut is let free)
59 1 Ignacio Sevilla
60 23 Alex Drlica-Wagner
61 1 Ignacio Sevilla
62 23 Alex Drlica-Wagner
!{width:300px}sg_separation_purity_vs_magauto_50.0_efficiency.png! !{width:300px}sg_separation_purity_vs_magauto_90.0_efficiency.png!
63 1 Ignacio Sevilla
64 1 Ignacio Sevilla
h3. completeness as a function of magnitude (for fixed purity, the threshold/cut is let free )
65 1 Ignacio Sevilla
66 24 Alex Drlica-Wagner
!{width:300px}sg_separation_efficiency_vs_magauto_95.0_purity.png! !{width:300px}sg_separation_efficiency_vs_magauto_99.0_purity.png!
67 22 Alex Drlica-Wagner
68 22 Alex Drlica-Wagner
h2. Receiver operating characteristics
69 22 Alex Drlica-Wagner
70 22 Alex Drlica-Wagner
The receiver operating characteristic (ROC) provides another tool for evaluating the performance of classifiers. The ROC provides some information orthogonal to that in the completeness vs purity plots:
71 22 Alex Drlica-Wagner
72 26 Alex Drlica-Wagner
* Because ROCs compare the true positive rate to the false positive rate, they do not depend on relative composition of the test sample. Thus, unlike the purity, they contain information only about the intrinsic performance of the classifier and not the test sample.
73 26 Alex Drlica-Wagner
* ROCs allow classifiers to be compared without requiring a threshold/cut to be placed on the output. This is useful because different projects possess different requirements on object sample, completeness, purity, etc. The area under the ROC can serve as a very high-level scalar metric for classifier performance.
74 1 Ignacio Sevilla
* Once a threshold/cut is placed, we can generate magnitude dependent true positive vs false positive rate plots which would be intrinsic to the classifiers.
75 27 Alex Drlica-Wagner
76 27 Alex Drlica-Wagner
77 31 Ignacio Sevilla
78 31 Ignacio Sevilla
h2. Summary of telecons 
79 31 Ignacio Sevilla
80 31 Ignacio Sevilla
h3. July 10th 2014
81 31 Ignacio Sevilla
82 31 Ignacio Sevilla
What have we found
83 31 Ignacio Sevilla
* 5 codes have been run on SVA1, based on round 2 training: 2 flavors of BDT, 2 flavors of Random Forests, TPZ.
84 31 Ignacio Sevilla
* Machine learning methods seem more uncertain in assigning a class in SVA1 as whole wrt COSMOS (training set is 90% COSMOS). TPZ slightly less affected. Sample variance, extra depth of COSMOS, or specially good conditions of COSMOS could be playing a role in this. 
85 31 Ignacio Sevilla
* Star number count distribution as a function of magnitude is very irregular for BDT and RF, not looking like the training set or the ones from other datasets. Up to some point TPZ is somewhat more robust. Modest keeps the shape.
86 31 Ignacio Sevilla
* Galaxy number count distribution as a function of magnitude does not seem so much affected (statistically the impact is smaller though, even if a contamination is there).
87 31 Ignacio Sevilla
* Eventually use multi-epoch spread_model information, as Erin is producing for WL.
88 31 Ignacio Sevilla
89 31 Ignacio Sevilla
Way forward: work towards round 3.
90 31 Ignacio Sevilla
* Alex to identify physically motivated, robust parameters to use ([[des-sci-verification:Variables_for_SG_Separation|progress reported here]]). More contributions are welcome.
91 31 Ignacio Sevilla
* Chris will create new catalog with chisq of fit to templates including star templates.
92 31 Ignacio Sevilla
* Nacho will work towards creating the new training/test set using Eli's shallow coadds (currently testing and comparing them) and the new spectroscopic catalogs from Chris. Nacho: Possibly incorporate systematics map info (for plotting conditions of training set, maybe to noisy to train on that too).
93 31 Ignacio Sevilla
* Alex, Chris, Edward to eventually train on new round 3 training set and run on SVA1 Gold (maybe separate by seeing conditions, one that matches the training set?).
94 31 Ignacio Sevilla
95 31 Ignacio Sevilla
h3. December 18th 2014
96 31 Ignacio Sevilla
97 31 Ignacio Sevilla
98 32 Alex Drlica-Wagner
- Round 3 submitted. SVA1 analyzed for modest, weighted average, TPC (TPZ for S/G classification). Not for BDT, Chris's codes.
99 31 Ignacio Sevilla
- Tested TPC on stripe 82 with round 3 training vs modest, spread_model, weighted average spread_model. 
100 31 Ignacio Sevilla
- Tested on correlation functions in SVA1-SPTE area.
101 31 Ignacio Sevilla
- Pending issues: 
102 31 Ignacio Sevilla
 * Doing better than spread_model in stripe 82.
103 31 Ignacio Sevilla
 * Dealing with LMC stars.
104 31 Ignacio Sevilla
 * Probabilistic output.
105 31 Ignacio Sevilla
106 31 Ignacio Sevilla
Where to go from here:
107 31 Ignacio Sevilla
- Final round? Which tests/calibration?
108 31 Ignacio Sevilla
- Color representativeness
109 31 Ignacio Sevilla
- Settle on procedure to automate for forthcoming years.
110 31 Ignacio Sevilla
- If results good --> paper later next year.
111 31 Ignacio Sevilla
112 31 Ignacio Sevilla
113 33 Ignacio Sevilla
Present: Alex, Chris, Edward, Maayane, Nacho
114 33 Ignacio Sevilla
115 33 Ignacio Sevilla
- Nacho: review of end of year situation. Round 3 says that ML codes are better than SExtractor's, including weighted average spread_model variant. However, results are similar in terms of purity and completeness in stripe 82 area. See details page and attached presentation (XXX_Argonne.pdf). A lot of tests of impact in a particularly sensitive science case with SVA1 data: determination of bias.
116 33 Ignacio Sevilla
There is indeed an impact of 1-3 sigma on certain photo-z bins, vs modest_class.
117 33 Ignacio Sevilla
- Maayane: suggests using pre-processed inputs via PCA or similar. Good results with Chris's Random Forests. 
118 33 Ignacio Sevilla
- Alex: should figure out before fine-tuning too much if we can sacrifice some performance for good generalization.
119 33 Ignacio Sevilla
- Chris: Generalization to SPTE area is not obvious, specially in LMC (certain star colors may not be represented at all). See plots attached to the page (g_r_XXX.png and i_z_XXX.png).
120 33 Ignacio Sevilla
- Solutions can go from reweighting at the training level to adapt to application color space, to some sort of 'prior' approach in which we take into account the position of sources (e.g. if near LMC, use only morphometry), adding simulated or LMC datasets to the training.
121 33 Ignacio Sevilla
- Chris: some ideas for the future, 4th moment information from object images (KIDs), SG from images, not catalogs (BTW, in fact spread_model has started pioneering this! Also I know Robert B has been working on it with a student here). 
122 33 Ignacio Sevilla
123 33 Ignacio Sevilla
124 33 Ignacio Sevilla
- Short term (till January): new TPC calibration (colors and weighted average spread_model) on COSMOS + spectroscopic fields, test on stripe 82. Make some tests to see if it can replace spread_model in bias paper:
125 33 Ignacio Sevilla
 * Purity and completeness as function of photoz
126 33 Ignacio Sevilla
 * Star/galaxy ratio on SPTE as function of photoz
127 33 Ignacio Sevilla
 * Color-Color plots of calibration vs spte and stripe 82 vs spte.
128 33 Ignacio Sevilla
 * (your idea here)
129 33 Ignacio Sevilla
- Mid term (early April):
130 33 Ignacio Sevilla
 * Fully develop the tests and procedures for SG classifier test, upload to repository.
131 33 Ignacio Sevilla
 * Attack the color representativeness issue
132 33 Ignacio Sevilla
 * Can we provide a Bayesian probabilistic output?
133 33 Ignacio Sevilla
 * Catalog for collaboration
134 33 Ignacio Sevilla
 * Paper!
135 33 Ignacio Sevilla
- Long term:
136 33 Ignacio Sevilla
 * Further improvements, Bayesian, 4th moment
137 33 Ignacio Sevilla
 * QSOs, what has been done with this within the collaboration
138 33 Ignacio Sevilla
139 33 Ignacio Sevilla
Not commented on telecon:
140 33 Ignacio Sevilla
* New COSMOS Y1 and Y5 reruns soon
141 33 Ignacio Sevilla
* Paper plans!
142 33 Ignacio Sevilla
143 33 Ignacio Sevilla
144 33 Ignacio Sevilla
Not discussed: what about training on colors of LMC stars that we can pick out with shape.