SG separation challenge » History » Version 20
SG separation challenge¶
Now that several people are testing their own approaches:
- Cut-based with DESDM info (Eli, Diego, Nacho, Ryan, William...).
- Multi-class (Maayane)
- Random Forests (Ryan)
- Boosted Decision Trees (Nacho, Alex)
- Alternative Neural Network with probabilistic output (Chris Bonnett).
- Probability based on spread model and photometry (DES-Brazil)
I think the time is right and the codes are mature to launch a specific SG separation challenge, mimicking the successful photo-z WG exercise.
We have to establish:
- The training/validation/testing sample (COSMOS, others).
I have prepared a 70/30 training/testing with the deep COSMOS field matched to ACS imaging. About 280 parameters, up to each tester to choose which.
Besides new datasets, also consider shallower COSMOS. Also consider fixed set of parameters as Eduardo suggests. Also need to add SLR corrections though I think not very important now.
- Only stars and galaxies? What about QSOs, image artifacts?
Star/galaxy for round 1.
- The metrics (Fixed cut, Fixed purity, Fixed Efficiency, ROC -- see example below).
I would prefer to use ROC, i.e., completeness vs purity curve formed changing the threshold.
- SVA1 systematics: correlations with depth, Galactic latitude, seeing, etc.
- Who/how to run it.
I suggest each group providing an output file with id (or ra,dec on first round) plus galaxy probability or binary value.
- Is there any gain combining them (a committee)?
- The schedule.
We suggest to use the same metric as in the DES star/galaxy separation (on simulation) paper (arXiv:1306.5236).
Completeness and Purity provided by a given classifier¶
We define the parameters used to quantify the quality of a star/galaxy classifier. For a given class of objects, X (stars or galaxies), we distinguish the surface density of well classified ob jects, N_X , and the misclassified objects, M_X .
- The galaxy completeness c^g is defined as the ratio of the number of true galaxies classified as galaxies to the total number of true galaxies.
- The stellar contamination f_s is defined as the ratio of stars classified as galaxies to the total amount of ob jects classified as galaxies.
- The purity p^g is defined as 1-f_s
Bellow are three different plots we suggest to use to assess the performances of each classifier.
Example, on simulations, from arXiv:1306.5236
purity as a function of magnitude (for fixed completeness, the threshold/cut is let free)¶
completeness as a function of magnitude (for fixed purity, the threshold/cut is let free )¶