Dual image object association

work by Eric Suchyta, Eric Huff, Peter Melchior

Executive summary

  • There are enormous positional offsets (up to ~30') between filters in the coadd object table. We believe these come from objects that are very faint in at least one of the filters considered, so that their measured positions have nothing to do with the location of the initial detection (from the riz coadd).
  • There are also substantial shifts in magnitudes (up to +- 5) between the true value of a Balrog object and what SExtractor reports for it. We believe that objects that are measured brighter then they should be are blended with existing objects without SExtractor noticing, and that objects that come out fainter may have been placed in areas with substantial masking (in the form of weight=0).
  • There is a bug in SExtractor's clean method that merges potentially independent objects together. Fragmentation occurs for about 10% of all objects, with a dominating fraction of those being formed by a large number of 1-pixel "objects" being merged with bright stars. A smaller fraction is formed by smaller objects that are often merged with a small number of other similarly large objects, which should negatively impact measured colors and other properties.

The problem

In Balrog tests, we found a population of objects, for which the simulated and recovered properties are strongly discrepant, e.g. a simulated galaxy with true magnitude 26 that is measured at mag 22. Using the tile viewer, we could confirm that the brighter object actually does exists. However, the positional offsets between the input coordinates of the mock object and those measured by SExctractor had excursions up to dozens of arcmin. This behavior persists in catalogs even when blends are excluded (flags = 0). So the problem is that distinct objects are associated.

For the coadd catalogs, SExtractor runs in dual image mode, using the riz coadd as detection image. This is different from using the association mode, which requires a reference catalog and performs objects matching in catalog space. Instead, dual image mode simply performs the measurements in the analysis image on the set of pixels (for each object) found in the detection image, basically a sort of forced photometry with a fixed aperture. To be abundantly clear: In dual image mode, no cutoff criterion for object association is considered (such as the default 2 pixels centroid offsets), meaning that potentially very distant objects end up being associated with each other.

Here are the plots for positional offsets between g and r, r and i, r and z from DESDM...

Histograms of differences between g- and i-band positions in Y1A1_coadd_objects table. Catalogs are cut to remove anything with flags_g >0 and flags_i > 0. All three histograms are normalized to integrate to unity. The cumulative distribution, right, shows that ~5% of the Y1A1 coadd objects have astrometric g-i offsets greater than 1".

The figures below provide evidence of the effects on the photometry.

Discontinuous segmentation maps

But how can such associations be made? A very old bug in SExtractor results in the presence of disjoint pieces of the segmentation map that are assigned to one object. For instance, the red object in the segmentation map has the same ID 234200 for the big object and the tiny speck below it:

riz detection image from the v4 SV pointed cluster coadd of Bullet, which employed a very setup very similar to DESDM

But since they are obviously not the same object, the dual image mode will force the analysis to utilize pixels that actually belong to two independent objects. This will have undesired consequences for photometry (among other parameters), effectively introducing additional scatter from merging two independent objects. The piece of code in SExtractor that causes the merging is in the clean function. Particularly line 121 in clean.c of the v2.18.10 code base states that the spatial distance for merging is set by 10 times the sum of the semi-major axes of both objects, which can be a very large reach for a big galaxy. Since such galaxies are rare, large offsets should be rare.

So, how often does that happen?

The fraction of objects which at least 2 disjoint groups of pixels in the segmentation map is about 10%, some of them have dozens and hundreds of fragments. These extreme cases are all very bright objects that get split into chunks (probably because the local background is elevated), and these junks are often just one pixel large. The weird thing is: 1-pixel objects should not exists due to the MINAREA requirement of several pixels above threshold. Anyway, merging such pixels back with the parent objects doesn't do any harm since they most likely came from the wings of that object anyway.

It gets more problematic for the cases with only a few fragments because they often come from distinct objects (such as the example above with 2 fragments). The right panel shows the size of the areas in the segmentation maps (radius of a circle with given number of pixels) of objects with multiple fragments, ordered by the size of the largest fragment (let's call it the Parent). I (Peter) should have color-coded the plot, then it'll become obvious that the long tail to the right is entirely huge and bright objects that are massively fragmented. Much more problematic is the bulk of objects below 5'', which tend to have fragments of similar size to the parent. Since (according to visual inspection) they often come from distinct objects, their colors will be off. Given that the fragments have similar size, it stands to reason that they may be similarly bright, rendering this potentially devastating for the measured properties. It would be preferred not to merge these fragments back, but this requires a change to the SExtractor clean method.

Colors vs positional offsets

Regardless of mechanism, the cross-band identification failures indicated by these large positional offsets have significant effects on DES photometry. The plots below show correlations between astrometric offset and magnitude measurements. We chose to compare g- and i-band photometry because g is not included in the riz detection image coadd. For all of the plots below, we have used the entire SVA1 coadd object table, excluded only things with SExtractor flags > 0 in the g or i bands.

Joint distribution of g-i position differences in SVA1_coadd_objects table and measured galaxy g- and i-band model, detmodel, and auto magnitudes. Vertical red lines are drawn at 1" offsets. Similar effects appear in the Y1A1 table.

Joint distribution of g-i position differences in SVA1_coadd_objects table and measured galaxy g-i colors, measured using mag_auto. Red line shows the median g-i color for this sample. Note especially the flare outwards in the envelope of the full distribution once astrometric offsets approach 1". Similar effects appear in the Y1A1 table.

Map of euclidean separation of SVA1 g- and r-band separations
Finally, we show here a map of the mean spatial distribution of g-i astrometric offsets in cells, for everything in SVA1_COADD_OBJECTS; the color scale is log10 (Euclidean, L2 norm) g and i separation on the survey footprint.

Examples of 10 arcmin offsets in the coadd catalogs

We thought it might be instructive to look at a number of obvious failures by selecting objects whose g and i band centroids (ALPHAWIN/DELTAWIN) are offset by about 10 arcmin (not a typo: it's arcmin).

First case: the position in the i-band, which was part of the detection image, and in the g-band

There's a tiny speck in the z-band at that location that may have been detected, but how SExtractor could have associated these two is really difficult to comprehend.
If we continue to work under the assumption that this is caused by the "disjoint segmentation map bug" then this position actually would only accidentally correspond to an object. Instead, the other fragment from the segmap would move the centroid somewhere along the line between the parent object and the merged fragment. However, this would mean that the fragment needs to be ~20 arcmin away.

Another case, where both locations in the coadd table do not appear to be related to any real object: i-band vs g-band

I've also seen cases, where the two locations correspond to two real, but certainly distinct objects. Again, without knowing what SExtractor detected, it is not certain what to make of such cases, they could be entirely accidental.

There are a few important aspects here:
  • It could be that these huge offsets correspond to marginally detected objects only, where SExtractor found an initial set of pixels in the detection image, but in one of several of the analysis images the objects is too faint for a centroid measurement. So, the windowed centroids end up anywhere.
  • If the flux is determined only from the pixels found in the detection image without considering an updated centroid (this is how AUTO and DETMODEL fluxes should do it), then this is basically forced photometry, and no real problem occurs. Fluxes are OK, only single-filter positions would be off. If, on the other hand the flux is measured at the new (measured) centroid, then things will get nasty. MODEL magnitudes could be affected by this problem, but we don't know what centroids are used for each of the filters (they are not stored in the DB).
  • Given that we detection image is an riz coadd, any single filter (even r,i,z) might have a drop-out at the location of the riz-detected object. This means that the coadd table should store the centroid of the riz image as the "official" coordinates of the object to maintain consistency. We do not think this is done because there is no detection catalog stored anywhere. Instead, Eli mentioned that the final centroid coordinate could come from the i-band image only.


Eli asked a good question, which is whether position offsets we're seeing are actually affecting the magnitude measurements. To test this, I cut to objects with g- and i-band magnitudes both brighter than 22.5 (none of these should be marginal detections). The plot below shows the joint magnitude-offset distribution of objects in this category.

magnitude-vs-offset for r<22.5 & g<22.5

Note that the tail to very high offsets persists.

position offset coded by magnitude errors

comparison of populations of fragmented and offset objects
Comparison of i-band distributions of objects with fragmented segmentation masks, and objects with large (>1") g-r offsets. This is from the OSU SVA1 coadds, which are processed in a similar, but not identical, manner to the DESDM catalogs.

--the top two panels show (measured - truth) for i- and g-band Balrog galaxies as a function of the difference in position between the input and output positions in those bands.
--The bottom two panels here show the same thing, but normalizing each magnitude difference by the measurement error reported for that object.

We're only using objects with flags=0 in both the g and i bands. And the color scaling in each histogram is logorithmic.

It would appear that, for objects with very large g- and i-band position problems, SExtractor is doing a good job of warning us that the photometry is bad; as Eli suggested yesterday, the photometry for these guys is more or less fine, as long as you take the reported errors seriously.

However, objects with position offsets in the 0.1-1" range seem to have dangerously large photometry errors. And there's a population of things with exceptionally well-measured positions (0.01-0.1" offsets) whose magnitude errors are drastically under-estimated. Peter suggests that these may be objects that fall across mask boundaries (but are somehow not flagged).

Here we excluded all Balrog objects inserted within 2" of a known DES source in either band. This removes most of the aforementioned issues, which suggests they're a blending problem; however, since we've already excluded objects flagged as blends, this suggests that there's a significant contaminant population of stealth blends, extending to very bright magnitudes.

Finally, we looked at what happens when you limit yourself to things with input magnitudes g<23. and r<23:

Here again, there's a (small) tail of heavily over-estimated magnitudes for position offsets of ~1". This histogram is much less well-populated, though, so I'm not sure if this is actually a smaller fraction of the total population.

The Balrog tables are in the database, if anyone wants to look at them; there are probably some plots you'd like to see that I haven't included.