# Copy of DES-2018-0389 Comments from IRC and CWR for DES publication 2018-0389¶

## Simon Birrer¶

Maybe add a reference to the Hubble constant measurement from the relative time delays of gravitationally lensed quasar images. The latest work is presented in arXiv:1809.01274.

AP: added

## D. Scolnic - Nov. 1¶

-I will have a full read in the next week, but wanted to start the discussion about the prior. I don't understand the justification for the narrow H0 prior considering this analysis is measuring H0 directly.

This statement of

"Our maximum a posteriori estimate of the Hubble constant is H0 = 69.6 +13.0 −12.9 km s−1 Mpc−1 from photo–z only, with a uniform prior on H0 between 50 and 90 km s−1 Mpc−1 . We find H0 = 75.2 +39.5 −32.4 km s−1 Mpc−1 using the more conservative flat prior between 20 and 140 km s−1 Mpc−1” and other parts of the paper show that the prior is doing the majority of the work. In the Fishbach paper referred to, you say they have a prior of [10,220] - this seems much more reasonable as it is effectively not introducing an H0 constraint. Why not do the same?

AP: We believe that [20,140] is already a huge prior range. As large scale structure along the line of sight is what really matters in this measurement, one could make the prior range indefinitely large and still get some information in the posterior. But at the same time one has to be careful about the completeness of the sample used: the further we go in H0, the further we need to go in redshift to include the relevant galaxies for our prior range. While it is not a problem to create a nice volume limited sample of galaxies that spans the 90 percent LIGO probability volume out to H0=220 km s−1 Mpc−1, we would need to cut more fainter galaxies from the sample as the luminosity cut would need to get brighter to ensure we have a volume limited sample. This measurement is not however interesting for single events, but rather for 100 of them as we show in the paper, and at that level of precision this choice of prior will not effectively be introducing a constraint.

On another note, in Fishbach at al they use the GLADE catalog which is super shallow, and then introduce some completeness weighting that does not take into account the clustering of galaxies. I would rather keep our very complete galaxy sample and avoid introducing these weighting we wouldn't understand just to go further in H0. Citing a collaborator during the discussion we had about the current prior choice "Not even de Vaucouleurs would have chosen this prior!"

DS: Hey thanks for this comment. A few things about this - first, if you are ok with 20,140, what is the justification for 50,90?

AP: The justification is that some of us felt that a [20,140] prior was too wide, so we decided to satisfy everyone and include both results.

It is said "The narrower but more sensible range, is expected to produce a prior–dominated result for one event. This effect diminishes over more events as the precision improves, and both priors become mostly uninformative." While this is kind of true, if ultimately one is root-N'ing down some large uncertainty, having the uncertainty for one event be 47.8% versus 18.6% (as shown in Table 3) should still matter a lot. If it doesn't, where is this shown in the paper? What makes me also nervous about this prior thing is that as Table 3 shows, when the prior is increased by 3x, the uncertainty on H0 goes up by 3x. I would think that the GW-gal signal has more pull than that.

AP: You are right, it is not shown in the paper that for 100 events it doesn't matter which of the large priors is chosen. We made runs on simulations, and we found comparable results for 100 events. This is only briefly mentioned in the text, so if you believe this should be highlighted more or it is not clear we can work on it. [DS: Yes - I think that is needed] I believe that what is causing the uncertainty to increase with the prior range is the reason that I was giving above for the choice of this range of prior. One could make the prior range indefinitely large and still get some information in the posterior, just because the large scale structure is there. For a wider prior, more clustered structure will contribute to the signal, by contributing with several peaks in the dn/dz that can show up as several peaks in the H0 posterior, which can broaden and merge depending on the uncertainty on dL, z and Omega. Eventually one could imagine to make a measurement of H0 using a "wrong" prior for one event, say [200-300], but the combined posterior will be uninformative after enough events.

Second, in Fig. 3, the posterior after applying the two different flat priors is used. How is it that the shape of the posteriors looks totally different even though all that was applied was a flat prior? This seems quite strange.

AP: As I mentioned above, the galaxy sample has to be cut at some point. We found that, after runs on simulations, a reasonable cut was at 90 per cent of the probability from the localization volume. So that is consistently done into angular and distance space. If we want to think about the galaxies being a prior on the true source position, then this prior needs to be consistent with the H0 prior. Galaxies that are too close/far away to be the host given a prior need to be cut out or they will introduce noise due to the tails of both the p(z_obs|z) and p(d_gw|dL) distributions. I cannot tell about how realistic the Gaussian p(d_gw|dL) distribution actually is, but I can tell you that a Gaussian distribution is not great for the photo-zs. However, this is not important for a measurement of the kind 70+-30 km/s/Mpc, but we need to address these systematics before hoping to do any precision cosmology. Finally, if we knew the redshifts perfectly, then I would agree, I would have expected the two posteriors to look very similar (apart from some clustering that may also affect the posterior when part of the volume is cut out). But the EM redshift likelihoods have tails that will contribute over a range of H0 values, and so lower and higher z galaxies that are excluded from the 50-90 prior, will contribute in the 50-90 range when we use the wider prior. The reason is that the dN/dz distributions in the 2 cases are different, even if you look at the same redshift range of interest for H0 [50,90] km s-1 Mpc-1. What I mean is that if you look at Fig 1, with the large prior we get a dN/dz that looks like the black line. Now, if we cut the grey histogram redshift, and resample the p(z) from this new galaxy sample, we may be getting a new black line more peaked around ~0.12 because we lose the contribution of the low and high z galaxies tails that was making up the original black line. Such contribution will depend on the width and shape of the single galaxies’ p(z)s (and that is in general larger at larger zs). So I believe that is totally reasonable that there is a bit of shift in the dn/dz that is reflected in the H0 posterior. We made this clearer in the text too now.

Third, can you report the uncertainties on H0 when only the flat prior is used (no information from galaxies/GW event)? That would be very helpful to see in the paper.

AP: OK, added to Table 1. [DS: Great, can you repost to docdb?] AP: Of course, we will post a new version after having responded to most of the comments here.

I don't think the de Vaucouleurs comment is quite fair because this paper is trying to measure H0, rather than use some prior on H0 for something else.

AP: That's true, but mostly we want to show that we can use this cosmological probe in the future, we knew that the outcome of this particular analysis would have not provided anything too interesting.

And lastly, I think it would be good to try out a prior not centered on H0=70. If prior was [60,120] - would H0 come out as 90?

Thanks!

AP: I find: 73.2 + 33.3 - 4.8 km/s/Mpc . I believe that the choice of the prior and the effect of choosing the relevant 90 per cent volume needs to be addressed in future papers that deal with the systematics of this probe.

[DS: Interesting - yeah, I think maybe my larger issue is that the abstract is painting this as if the paper has measured H0 from this one event to +-13, whereas really this is mostly just measuring the prior. When you say "consistent with SN Ia and CMB measurements of the Hubble constant. " - I could say that this is true, but given the larger prior as shown in Table 1, it is also consistent (2.3sigma) with H0=0 - a non-expanding universe. If as you say above that the prior becomes less important with multiple events, I would think the result is about using this one event to say what H0 could be with 100, but quoting an H0 result from this one event seems like just asking for trouble from a referee.]

AP: Thanks for your useful feedback Dan. I think we could mention the result with the larger prior in the abstract, and say something about the importance of the prior, what do you think? I suggest we discuss this with everyone else at the telecon if you will be able to attend.

## Daniel Gruen, Nov 4¶

Just some curious questions / comments

- How is the analysis affected by the possibility that the event happened in an unobserved galaxy (due to masking, or faintness of the galaxy)? You mention that you capture 77% of the stellar mass. Could you discuss how the missing stellar mass affects your result?

AP: How much our result changes depends on the clustering of the host galaxy. What we could test, was host galaxies fainter than our limit by up to ~1 magnitude (because that is how fainter we could go in the simulations). When placing events on these galaxy, the posterior were not in average significantly affected, when looking at 100 simulated events. - When you define the Omega_i, Omega_GW as a "solid angles" I am confused. Is this just a celestial coordinate?

AP: It is a solid angle in the sky, corresponding to the angle subtended by each healpix pixel (as we work with healpix here), with a direction. So yes, it is a celestial coordinate, with a corresponding area which is uniform over all pixels. We have added this to the text for clarity, with a \hat for \Omega to show that it is a vector pointing to the position of the pixel. If one wants to use RA,Dec, then a cos(Dec) should appear. - The paragraph beginning at l.337 is hard to understand for me. Is this because you turn p(d_EM|z_i) into p(z_i|d_EM)p(z_i)? If so, should p(z_i) be the overall redshift distribution of your selected galaxies rather than propto dV/dArea? The algorithm you write in 2.2 should not give you constant comoving density, I think.

AP: We mean that p(d_EM|H0) can be marginalized over z and Omega like \int dz dOmega p(d_EM|H0,z,Omega)p(z,Omega), where p(z,Omega) is some a priori information. We don't know the distribution of the galaxies in this area a priori. And even if we wanted to include the p(z) of the galaxies in this region, then we should also include the clustering information because of the presence of the Omega...a bit outside the scope of this work. We use a simplistic prior. - The assumption you describe in l. 347 is I think already made in eq. (2).

AP: They are not the same thing. The delta function in eq. 2 is saying that the position of the GW source corresponds to that of galaxy i. The assumption in 347 says p(Omega^obs|Omega^truth_i)=\delta(Omega^obs-Omega^truth_i)

## Maya Fishbach, Nov 5¶

I would like to echo Dan's concern about Figure 3 and the effect of the prior. I appreciate your explanation for how the galaxy cuts are causing the difference between the posteriors under the two prior assumptions in Figure 3. However, I disagree with your characterization of this effect as a systematic -- if the galaxy cut is affecting the results so significantly, then the cut is mathematically incorrect. Yes, you can apply a preliminary galaxy cut to speed up computation, but you cannot cut out galaxies that contribute to the H0 posterior within your stated prior range. In other words, you must keep all the galaxies that will have a non-negligible contribution to the H0 posterior within the stated range of your prior. Clearly some of the galaxies you have cut out are not negligible, or the result with the 50-90 range would not have such a clear peak. To be mathematically correct, the 50-90 range result must be identical to the wider prior result (and offset by a normalization constant). Even if the wider prior result is correct, I worry that a referee looking at Figure 3 would have doubts about the entire analysis. This is especially true because one way to do a sanity check on the analysis is to look at the relative redshift over-density corresponding to the peak in H0. The current Figure 1 shows the "absolute" over-density in terms of galaxy counts (dN/dz - dN/dz_comoving), but not the relative over-density (dN/dz / dN/dz_comoving), which is the meaningful quantity for this analysis. I think that showing the magnitude of the over-density by replacing Figure 1 with a plot of the relative over-density would be much more convincing.

MSS: What you are calling "mathematically incorrect" is a consequence of lack of understanding of the procedures we used. If you think a little bit about the process we used in this analysis you will see that it is very straightforward and consistent: we first chose the range of H0 values we want to probe (say, 50-90), and then use the +/-90 percent cl. in the GW event distance to define a redshift range of interest. We then select galaxies within that range and perform the analysis as described in the methods section. If we change the H0 range (say, to 20-140) we need to redefine the redshift of interest and redo the galaxy selection to maintain consistency in the methodology. Otherwise, in one case we would be considering 90 percent c.l. and in the other case we would be considering something smaller, say 80 percent or so. Conversely, if we use 90 percent for the wider range and don't adjust the galaxy sample to maintain consistency, we would be using a larger c.l. range for the narrower prior range, which would also be an inconsistency. Note that in angular space we also keep only galaxies within the 90 percent c.l. range, so we are keeping consistency there too. So, as you can see, there is no big "mathematical" issue. We are only being very careful about doing things consistently.

AP: Here is the plot you and Dan Holz asked for. We don't love the idea of showing a galaxy distribution that diverges at 0 by construction, and I am not sure that one can make an absolute statement about this being the meaningful quantity for this analysis (also considering that in the final likelihood we multiply the p(z) by the volume element rather than dividing). The top right plot of Fig 1 is our plot showing the data, the DES redshift galaxy distribution, which we compare to a galaxy distribution uniform in comoving volume by showing the difference between the two. We like it this way, and I think that there is nothing wrong with keeping the histogram as is. If the referee asks for a plot in which we divide the redshift histogram by the uniform in comoving volume histogram, here it is:

## David Burke, Nov 5¶

The "money" conclusion given in this paper is that measurement of H0 with 4.5% precision is potentially possible from a statistical analysis of one hundred GW events. The conclusion is based on assumptions about how well distance and redshift measurements can be made. Not considered in the analysis is the dependence of this precision on the underlying cosmological model - the study assumes a fixed cosmology with other parameters perfectly known. It seems to me this "standard siren" result is most naturally compared with the BAO "standard ruler" result (e.g. as expected from DESI) as they should depend on cosmology in much the same way. The recent DES "inverse distance ladder" measurement of H0 from combined SNe and BAO (docid=12894, pubid=287) by Macaulay et al is a useful place to see what the limiting systematics might be from a more complex expansion model (i.e. inclusion of de-acceleration, jerk, and snap parameters). That paper emphasizes the "distance duality" relation, and from Figure 1 on Macaulay et al it seems with the present data sets the uncertainty on H0 from BAO alone is about +/- 4% due primarily to covariance with the higher order terms in the expansion history. My specific concern is that this paper (Soares-Santos et al.) focuses too much on how the (not-so-interesting) central value of H0 (69.6 +13.0 -12.9) is determined but relies on using simulation to extrapolate the (very-interesting) error from 1 event to that expected from 100 events. My suggestion is that this paper should state more clearly (in the Abstract, text, and conclusion) that this study considers only fixed cosmology, should discuss more fully what systematic errors might persist, and in particular, discuss the relation of a "standard siren" to a "standard ruler" - are they conceptually degenerate?

[DH: The result for 100 events isn't new to this paper (see e.g., https://ui.adsabs.harvard.edu/#abs/2018Natur.562..545C/abstract ; that paper includes the full H(z) in its analysis). There are certainly interesting questions about comparing to BAO, and fully exploring other cosmological parameters. However, the main point of this paper is to do the first statistical standard siren measurement with a BBH system. This is not the first paper to develop the method, nor is it the first paper to make forecasts, nor (as you point out) is the basic H0 measurement of any interest. The interesting aspect is that this is the first paper ever to apply the statistical BBH method to real data.]

MSS: Thanks for the comments David. We really are not very sensitive to cosmological parameters beyond H0 for events at such nearby distances. It would be wonderful to have enough sensitivity to talk about those parameters, but we are not there yet... We mention the sources of systematic uncertainties we expect in section 4 (near line 450) and discuss them also in section 5 (from line 527 to 563). We think that photozs will be the most important source of systematic uncertainty (biases, outliers, and non-Gaussian pdf shapes). There is also the fact that we don't know if BBH mergers occur preferentially in certain types of galaxies. We plan to do a more detailed study in the future of the impact of these sources of systematic uncertainties in a future paper.

[DLB] Ok. Thank you for the response. But responding to me is not the important thing. I think that some of your comments (especially the reference in Nature) should go into the text to be more clear about what is new in this paper and what the limitations and challenges would be to reaching the 4.5% precision.

## Daniel Holz, Nov. 5¶

Marcelle and Antonella, congratulations on putting together this draft. The key result is a clear and relatively narrow peak at the right value of H0. This would be a dramatic and exciting result, if true! As I've mentioned to you both in the past, however, I have some concerns about the result, and I think it might be valuable to perform some additional tests to ensure this result is correct.

I believe some of these concerns are shared by Jon Gair and Will Farr, the official LIGO reviewers, and they have already conveyed them to you directly. My initial plan was to let the LIGO review take its course, but given that this paper is now in CWR and rapidly progressing towards posting/submission, I thought I would raise some questions on the wiki:

MSS: Thank you, Daniel, for taking the time to write down the comments to Jon and Will, the LIGO reviewers, and also here. We have been in touch with them and we think that the gist of their comments is contained above. We will try to answer them to the best of our ability. We hope we can go through the LIGO review process as quickly as in DES.

[DH: Thank you for responding so quickly!]

- As Maya Fishbach first pointed out a while back, there is something inconsistent in the "money plot" (Fig. 3). Both Dan Scolnic and Maya bring this up above, and I confess I don't follow your replies. If a galaxy contributes in the [50--90] range for the [20--140] prior, then it should contribute equally when the [50--90] prior is applied. Your explanation implies that galaxies are being dropped towards the edges of the prior range, which would naturally lead to a peak towards the center even if the actual measurement is uninformative.

MSS: See Antonella's plot above. It addresses your point. We used an asymmetric prior (centered away from H0=70) and still got a similar result, not something artificially peaked in the middle.

[DH: That plot is very helpful! It is reassuring to see that the peak is not simply at the middle of the prior range. However, I think Maya's essential point still remains: in principle, the shape/peak of the curves should not change with different top-hat priors. Do we agree about this, in principle? I guess the concern might be that the approximations you are making cast doubt on the shape of the posteriors, and therefore bring into question the resulting intervals on H0 (which is a key result of the paper). Is there a way to do the analysis so that the shape of the posteriors don't change for different top-hat priors? For example, would using the 99% localization volume, instead of 90% as is being done now, help?]

AP: If you like to see the galaxies' positions as priors, then these have to be consistent with the H0 prior, and the combination of the two is not flat in H0 anymore. We will consider all the suggested tests, if we can do them within the timescale we have.

MSS: The change in shape is a consequence of the way we are doing the analysis, keeping consistently a flat H0 prior in both cases. It is not a mistake. We cannot go around changing the analysis methodology after unblinding, specially if there is nothing wrong with it, and the only reason is that we don't like how it looks. Simply pushing the edge to 99 percent will maybe make the plot look "better" but will not eliminate the effect. And thinking about what it would take to push the range up to 99 percent or so, I think we would run into another issue: the event is so close, and the luminosity distance gaussian so broad, that in the lower edge of the distribution of galaxies we will start to get in the z<0.01 regime which is a really hard regime to handle due to small number statistics and all sort of issues with super bright galaxies. The 90 percent range is a reasonable choice for this analysis, and we need to keep consistency, as described.

- For the result to be correct, there needs to be a huge over-density at the "right" redshift (z~0.15). Fig. 1 does not show the relative over-density, so it is very difficult to judge if this the case. Given that there are ~70k galaxies, it would really be surprising (although not totally impossible) for such a convenient over-density to exist. It might be interesting to redo Fig. 1 showing the fractional over density, preferably weighted by the actual sky map?

MSS: That is the very point. There are two large over-densities: one at z~0.06 and, more importantly, one at z~0.12. We describe them in the paper in section 2.2 (around lines number 295-309). There are, in fact, several known massive Abell clusters in the region. And the structures we see are beautiful large scale walls and filaments, which we can clearly see in the spectroscopic data. Antonella made the plot you are talking about and is posted near Maya's comment. You can see the discussion there.

[DH: I look forward to seeing the plot. The over-density presumably can't be just one or two additional rich clusters, but instead needs to be the equivalent of many (tens of?) thousands of additional galaxies all in a localized region.]

MSS: Abell 3112, 3109, 3102, 3104, 0342, APMCC 351 and MSCC 118. These are some of the massive (10^14 M_sun) clusters in that region. You can see in Antonella's plot above, that the two overdensities are huge...

- The analysis of GW170814 should be identical to what we did in our **statistical** analysis of GW170817 [https://ui.adsabs.harvard.edu/#abs/2018arXiv180705667F/abstract], except that your volume is 5,000 times larger (and therefore contains ~5,000 times more galaxies). For statistical GW170817 we found H0=76^48_23 (which we thought was pretty good!). This GW170814 analysis, instead of being significantly worse, is comparable to our **counterpart** GW170817 measurement [https://ui.adsabs.harvard.edu/#abs/2017Natur.551...85A/abstract]. In other words, your analysis produces a result, marginalizing over 70,000 galaxies, that is comparable to the measurement when a single unique host galaxy is identified. Do you find this surprising? Part of the issue may be that, for a limited prior range, even a completely flat (uninformative) posterior would produce an "interesting" 90% interval, just because there's not enough room in the prior.

MSS: I can see where you are coming from on this. We made an extensive comparison with the results from both those papers, around lines 501-526. Please see what we wrote there and let us know if we should expand on it. We agree that the results quoted in the abstract could more properly express the complexities related to the prior choice. We will work on rewording the abstract to include the information from Table 1 and its implications.

AP: Dan, let me add that the two measurements you are comparing use different priors (both different in range and shape, as Abbott et al use a flat-in-log prior). We show both results using a narrower and broader prior. We explicitly say that the narrow prior result is prior dominated for a single event of this type. This is in Section 3. At the lines that Marcelle pointed you to, we have compared our result using the broader prior to the GW170817 counterpart result (as these use more similar prior ranges), and say that our result is, as expected, much worse.

- It would be interesting to run with the same LVC sky map, but shift the galaxy catalog to a random uncorrelated line-of-sight. We were advocating this test over a month ago, when Zoheyr wrote code to rotate the galaxy catalog appropriately. It would hopefully be straightforward to produce results with the **identical** pipeline. If the other lines-of-sight show results with peaks distributed uniformly in H0, or with multiple peaks, or (as I think should generically happen) with no particularly distinct peaks, then that would be an important validation of your method. If, instead, the other lines-of-sight show a similar peak in the middle of the prior range, then this would be a source of concern. Have you run these tests by any chance?

MSS: Yes, that would be an interesting test. That is what we did with the simulations.

[DH: I am concerned that the simulations aren't sufficient to show these effects, because of simplifications and approximations that were used to build them. It would be very interesting to run the identical pipeline to what was done for Fig. 3, but on some other lines-of-sight in the galaxy catalog (constructed the exact same way).]

I have additional concerns and comments about the method, the results, and the draft, but I thought it might be best to mention the ones above first. It would be **fantastic** if these results are correct, since they imply that the statistical H0 method is much more effective than expected. However, given some of the concerns listed above, it might be valuable to perform a few additional tests before publishing such an important result.

MSS: We appreciate your comments and input. We will have the CWR telecon on Friday at 10am Chicago time. We think it would be good if you and the LVC collaborators who are interested in this paper would join the discussion there.

[DH: I won't be able to make the telecon, unfortunately. I will be in a meeting of the Bulletin of Atomic Scientists on Friday, setting the time of the Doomsday Clock. The threat of global annihilation takes precedence, even over standard sirens...]

## Comments from Jon Gair and Will Farr (LVC) - on Nov 4¶

Review statement for joint DES/LVC GW170814 Hubble constant paper

==========================================================

This paper reports the first estimate of the Hubble constant from a dark siren. It uses public GW data from the GW170814 binary black hole event and photometric redshift measurements for galaxies consistent with the GW-inferred sky location to derive a statistical constraint on the Hubble constant. The current result is 69.6 +/- 13 km/s/Mpc, based on a prior of [50, 90] km/s/Mpc. The paper has begun circulation within the DES collaboration and so it is being circulated to the LVC now to coincide with that process. The methodology described in the paper is largely correct but the reviewers are not yet convinced of the correctness of the final result, and therefore cannot at this time recommend the LVC join in the paper. We are continuing to work with the DES paper writing and analysis team to resolve the remaining issues. We have one major issue with the paper as it stands and a number of minor issues, outlined below.

Major Issue

-----------

Figure 3 of the paper shows two posterior distributions for the Hubble constant. These are based on two different flat priors - H0 \in [50, 90] and [20,140]. These two posterior distributions are inconsistent with one another - the peak is in a different location and the shape changes under the narrower prior, when the shape of both posteriors should instead be identical in the overlapping region. Based on iteration with the DES authors, this change seems to arise because cuts are applied to the galaxies included in the catalogue based on the GW-inferred luminosity distance and the maximum/minimum Hubble constant value included in the prior to eliminate galaxies at very large and very small redshift. These cuts appear to be artificially inflating the significance of the "headline result" of the paper by narrowing the posterior. They are clearly too aggressive, and we would request that the galaxy catalogue that was used for the wide-prior result also be used for the narrow-prior result so that the results are at least consistent. We will also request additional checks to assess the ways in which the galaxy cuts influence the result for both choices of prior to ensure that the cuts are not affecting the final credible interval reported for H0.

Minor Issues

------------

The posterior in Figure 3 appears to be inconsistent with the quoted uncertainty range. The 68% confidence interval is [52.7, 82.6], which represents 65% of the original prior range. The posterior should therefore be much flatter than it appears in the Figure.

It would be nice if Figure 3's y-axis started at zero. This would make it easier to visually assess the relative change in the posterior across the range of H0 values.

The result is driven by the observed over density in galaxies in the footprint of GW170814, as shown in Figure 1. The scale of the over density can be used to assess the rough validity of the result, provided that the average comoving number density (dN/dz)_com is provided in addition to the difference of the estimated and average number density plotted in the Figure. We would request this number be included in the draft.

Equation (6) in the paper double counts galaxies, as it includes both a sum over galaxies and a prior weighting r^2/H which is volumetric and hence larger for galaxies at higher redshift. This equation should include normalisation of the terms by evidence integrals Z_j = int p(dEM | zi) r^2(zi)/H(zi) dzi so that the galaxies contribute through the posterior from the EM data. We don't believe this will significantly affect the result but would request the equations be updated so that they are correct.

Equation (7) in the paper is only correct for a flat prior. This should be rewritten as a prior on H0 times a product of likelihoods for each event.

Jon Gair

## Zoheyr Doctor, Nov. 6¶

Thank you for putting together this paper so quickly, Marcelle and Antonella! It's clear you have put in a lot of time into the pipeline and draft. Before any comments on the specific text, I have a few scientific questions, both of which I brought up before CWR:

MSS: Thanks, Zoheyr.

1. The final H0 posterior distributions confuse me. How can they have different shapes for the same H0 values, even though both use a flat H0 prior? Since p(H0|data) \propto p(data|H0)p(H0) and p(H0)=const for both priors, any changes in shape of p(H0|data) must be due to your likelihood p(data|H0). But the likelihood knows nothing about the prior, so should not change from prior to prior. This means that there must be an error in the likelihood calculation. As has been pointed out, this is probably due to cutting the galaxy catalog early, effectively changing the 'data' in p(data|H0). Of course, the data is the data, and shouldn't change based on the analysis we perform. Am I missing something here? As far as I understand, there's no ambiguity on what p(data|H0) is, and it certainly shouldn't depend on prior choice. So my opinion is that the galaxy catalog needs to not be cut so short, neither in redshift nor in sky position (i.e. use larger than 90% credible bounds).

MSS: Please see the discussion above, especifically our responses to comments by Scolnic, Maya, Holz. The change in shape is because of the way we select galaxies that go into the calculation, picking 90 percent of the GW localization volume, consistently with the prior choices.

[ZD: I understand what was done and had read through the responses, but unfortunately I agree with the other commenters and I don't think the different looking posteriors can be correct. Regardless of prior choice, the likelihood p(data|H0) **should not change**. This is just a fact of likelihoods. They don't contain any information about our prior beliefs. This is also true of the likelihood written in equation (2). No where in this equation does anything depend on the prior. So at a minimum, equation (2) is not a true reflection of what is going into Figure 3. As you and others have noted, dependence on the prior is being introduced by cutting on data_EM based on the prior bounds that are not wide enough. This goes against the fact that data_EM doesn't change based on our analysis. The data is always the same. We can only choose to ignore some of the data if it doesn't affect the likelihood. In this case, the data that was excluded is clearly affecting the likelihood. So just to be totally clear, I do not think the differences in posterior shapes in Figure 3 can be correct because (a) the likelihood shouldn't know about the prior choice and (b) the data we collected does not change based on the analysis we run. I see why you wanted to be consistent, but in this case it's leading to a **mathematical** inconsistency. ]

2. Is the overdensity in the true galaxy redshifts or the overdensity in DNF redshifts driving the peak in H0? The tests I ran on off-source patches of the catalog showed common structure along different lines of sight for DNF, and at approximately the redshifts at which you find overdensities. Are we sure that the DNF systematics aren't therefore driving this H0 result? In figure 2 bottom panel, the individual posteriors shown in gray from the simulations tend to have little structure and are mostly uninformative. Why do the final H0 posteriors in Figure 3 with the real DNF data do so much better than the posteriors in the simulations? **An easy check of whether DNF is having an effect is to run on off-source patches of the catalog**. If DNF is driving the result, then that's fine -- we just have to say that in the paper. I think running on off-source patches is the best test because it (a) avoids any assumptions in turning the simulated catalog into an observed DNF catalog and (b) gives a real indication of what types of H0 posteriors to expect. At a minimum, it would be nice to see plots of dN/dV for different lines of sight in the catalog.

MSS: The overdensities you are seeing are real structures, not a photo-z artifact. In fact, once you consider the photoz undertainties, the effect is to wash away the structures a bit - not to create them. Since that test that you made during the internal review process (thanks for doing that, by the way), we did extensive checks, using spectroscopic data available in the region, and running a second, independent photo-z code. We showed that the structures are real. We present a discussion about these checks in the text, in section 2.2 (I mentioned this already above too, please see my responses do Holz's comments) Please check again section 2.2 and see if you need any more clarification. We'd be happy to improve the wording on this. We also checked patches outside of the region of interest. There are peaks and structures there too. Real peaks. Spectroscopically confirmed peaks. Which show that what you saw was real, normal large scale structure. The patches that you looked at were probably not far enough apart to be totally uncorrelated.

AP: Yes I can confirm that those patches you looked at were adjacent to what we are using here. These are big chunks of the DES footprint by the way! But stuff at very low z will likely be correlated over large areas of the sky. We found a giant wall at z~0.06 covering a lot of the DES footprint by looking at spectroscopic samples in this region and in other parts of the sky. I have run ANNz2 to get some photozs which were optimized for our science case, and I found them to be much better in terms of sensitivity to the training sample dN/dz and error estimation. So this is what we are using.

[ZD: I agree that the uncertainties do wash things out a bit. But as seen in the plot posted by Antonella above, this also creates a dN/dV overdensity at low z because the errors are Gaussian in z, so another issue may be being introduced. For what it's worth, the two patches I analyzed were 30x30 sq degrees centered at ra,dec = 45,-23 and 345,-53, so they were pretty far apart. If the overdensities are just large-scale structure, then cool! But either way, there are aspects of the catalog that are not truly captured in the simulations. (hence the difference of shapes of gray lines in Figure 2 compared to posteriors in Figure 3). The most agnostic way to check how large-scale structure (or possible redshift systematics) affects the measurement is to simply run on a different portion of the catalog. (I can pass along code to rotate galaxy catalogs if you don't already have some to do this). This would also show the different values of H0 that might or might not peak in off-source data, therefore lending credence to the paper's claim that the galaxy catalog is giving us real information on H0. In my opinion, showing H0 posteriors from different lines of sight would make this paper MUCH stronger, because it would show the robustness of the measurement. Part of the reason people believe our LIGO measurements is because we estimate our backgrounds using "off-source data", not just an imperfect model of the noise. I think the same should be done here.]