SG separation challenge » History » Version 17

Version 16 (Ignacio Sevilla, 11/29/2013 09:27 AM) → Version 17/37 (Ignacio Sevilla, 12/09/2013 09:59 AM)

h1. SG separation challenge

Now that several people are testing their own approaches:

* Cut-based with DESDM info (Eli, Diego, Nacho, Ryan...).
* Multi-class (Maayane)
* Random Forests (Ryan)
* Boosted Decision Trees (Nacho)
* Alternative Neural Network with probabilistic output (Chris Bonnett).
* Probability based on spread model and photometry (DES-Brazil)
* Others...

I think the time is right and the codes are mature to launch a specific SG separation challenge, mimicking the successful photo-z WG exercise.

We have to establish:

* The training/validation/testing sample (COSMOS, others).
I have prepared a 70/30 training/testing with the deep COSMOS field matched to ACS imaging. About 280 parameters, up to each tester to choose which.
Besides new datasets, also consider shallower COSMOS. Also consider fixed set of parameters as Eduardo suggests. Also need to add SLR corrections though I think not very important now.
* Only stars and galaxies? What about QSOs, image artifacts?
Star/galaxy for round 1.
* The metrics (Fixed cut, Fixed purity, Fixed Efficiency, ROC -- see example below).
I would prefer to use ROC, i.e., completeness vs purity curve formed changing the threshold.
* SVA1 systematics: correlations with depth, Galactic latitude, seeing, etc.
* Who/how to run it.
I suggest each group providing an output file with id (or ra,dec on first round) plus galaxy probability or binary value.

* Is there any gain combining them (a committee)?
* The schedule.

h1. The metric

We suggest to use the same metric as in the DES star/galaxy separation (on simulation) paper (arXiv:1306.5236).

h2. Completeness and Purity provided by a given classifier

We define the parameters used to quantify the quality of a star/galaxy classifier. For a given class of objects, X (stars or galaxies), we distinguish the surface density of well classified ob jects, N_X , and the misclassified objects, M_X .

* The galaxy completeness c^g is defined as the ratio of the number of true galaxies classified as galaxies to the total number of true galaxies.
* The stellar contamination f_s is defined as the ratio of stars classified as galaxies to the total amount of ob jects classified as galaxies.
* The purity p^g is defined as 1-f_s


h2. Plots

Bellow are three different plots we suggest to use to assess the performances of each classifier.

h3. Histograms

Example, on simulations, from arXiv:1306.5236

h3. purity as a function of magnitude (for fixed completeness, the threshold/cut is let free)


!{width:400px}sg_separation_purity_vs_magauto_50.0_efficiency.png! !{width:400px}sg_separation_purity_vs_magauto_90.0_efficiency.png!

h3. completeness as a function of magnitude (for fixed purity, the threshold/cut is let free )

!{width:400px}sg_separation_efficiency_vs_magauto_95.0_purity.png! !{width:400px}sg_separation_efficiency_vs_magauto_99.0_purity.png!