ArborZ Photometric Redshift for DES SV-A1

This page presents the current best results for SV-A1 photo-zs computed with ArborZ. ArborZ uses Boosted Decision Trees to produce a full p(z) distribution for each galaxy. These p(z)'s retain full information about the redshift distribution of each galaxy, unlike a single-number, "best-estimate" photo-z which must inherently discard or collapse information. As a result, we show that using p(z) reduces bias and more faithfully reconstructs the true underlying redshift distribution than a single-number, best-estimate photo-z can.

Training / Evaluation

To generate these photo-zs, we first took the DES SV-A1 and computed calibrated photometry for each galaxy (absolute calibration was performed using Huan Lin's method described here: Extinction correction was performed on an object-by-object basis using SFD98 with the Schafly et al. 2010 recalibration.

SV-A1 was then matched to 2dFGRS, ACES, GOODS, K20, OzDES, PRIMUS, SDSS, VIPERS, VVDS, and zCOSMOS. The resulting redshift catalog contains 81,423 objects out to redshift 1.2. We divide this into a training set--containing a random sample of 1/3 of the redshift catalog--and evaluate against a target set--containing the remaining objects. Only MAG_AUTO_[GRIZY] were used as training variables.


ArborZ produces a p(z) distribution for each galaxy in the target set, and plotting these versus true redshift gives the first plot above. We see that using p(z) produces a very clean, low bias estimate of the true redshift.

Despite having the power of a full p(z) at hand, it is still possible to produce a single-number, "best-estimate" photo-z from each p(z), which necessarily discards important information about the redshift of a galaxy. When we plot this single-number, best-estimate photo-z against the true redshift, obvious biases get introduced:

Clearly, using a best-estimate, single-number photo-z introduces significant bias in the results; only retaining the full p(z) distribution can remove the shortcomings of a single-number approach.

The "Residual" contours shown in the second plot are found by subtracting out the true redshift from each p(z) and plotting the summed p(z)'s in each true redshift bin. The means in each redshift bin correspond to bias, and the widths are errors. Using p(z) produces only a small bias.

However, if we also plot a line representing the bias using best-estimate, single-number photo-z on top of these contours, we can see the tremendous difference a p(z) approach makes:

We can also attempt to reconstruct the true redshift distribution from our photo-z results. We see that using p(z) faithfully reconstructs the true underlying redshift distribution, whereas using a best-estimate, single-number photo-z exhibits obvious bias:

There is still ongoing work in improving these p(z) results even further, as well as in producing additional performance plots / metrics. A full SV-A1 redshift catalog will be posted here soon.


We present photometric redshifts for DES SV-A1 using ArborZ. ArborZ produces a full p(z) distribution for each galaxy, and using p(z) significantly reduces bias in photo-z computations. On the other hand, a best-estimate, single-number photo-z results in biased results. It is therefore important to use the full information from p(z) when doing cosmological calculations, integrating the quantity of interest with p(z) where appropriate.