Wednesday, June 25, 2014

Pros & Cons of Pre-Publication Publicity (& BICEP2)

It all started with a press conference on March 17th, 2014. Leaders of the BICEP2 team claimed that they found the “first direct evidence of cosmic inflation”, and the “first images of gravitational waves, or ripples in space-time.” This is a huge advance in cosmology if true. The publicity occurred before the paper was accepted for publication in a journal, and the work has been cross-examined by experts ever since. For excellent summaries of the initial claim and the skeptical response by some scientists, see articles by Richard Easther, Alan Duffy and Heino Falcke. The article by Falcke is particularly interesting because he discusses science communication issues raised by this story.

In this blog post I will add to Falcke’s comments by discussing some of the pros and cons of publicizing a paper before it is accepted, using the BICEP2 result as a case study. I note that the BICEP2 paper has just been published in Physical Review Letters and includes a few substantial changes and some hedging on the original claims, as reported by Dennis Overbye in the New York Times, Jacob Aron, Lisa Grossman and Stuart Clark in New Scientist and Nadia Drake in National Geographic News.

Gravitational waves from inflation are expected to generate a faint but distinctive twisting pattern in the polarization of the cosmic microwave background (CMB), the left over radiation from the Big Bang. This twisting pattern is known as a "curl" or B-mode pattern. For the density fluctuations that generate most of the polarization of the CMB, this part of the primordial pattern is exactly zero. Shown here is the actual B-mode pattern observed with the BICEP2 telescope, which is consistent with the pattern predicted for primordial gravitational waves. The line segments show the polarization strength and orientation at different spots on the sky. The red and blue shading shows the degree of clockwise and anti-clockwise twisting of this B-mode pattern. (Caption has been slightly modified from the CfA version.) Credit: BICEP2 collaboration 

(In full disclosure, I work in public affairs in the Chandra X-ray Center based at the Harvard-Smithsonian Center for Astrophysics, where the BICEP2 results were announced and where part of the BICEP2 team is based, but I don’t know any of the team members personally. Also, unless noted otherwise, I will use the term “peer review” in the traditional sense, to describe independent review conducted by a referee or referees selected by a journal’s editors as part of their publication process.)

What are some of the advantages of doing publicity before a paper is accepted to a journal?
− Peer review is not a flawless process and this is a public way to acknowledge that. As biologist and blogger Jonathan Eisen has pointed out, we should not deify peer review. There are many examples where peer review has failed to detect serious problems published in science papers, including the well-known "arsenic life" paper in Science. This article by Carl Zimmer gave a devastating response to that paper by independent experts, not long after it was published, contrasting strongly with the very positive reviews by referees, as reported by Dan Vergano. 
One of the problems caused by the arsenic paper was that multiple scientific disciplines were covered, making the paper difficult to referee, even using three reviewers. In astrophysics, most of the commonly-used journals like The Astrophysical Journal usually use only one referee, so if the journal makes a poor choice of referee, the review can have very limited benefits. However, with BICEP2 a much more narrow range of expertise was required than for the arsenic paper, and there was an obvious choice for a referee: one of the leading researchers for WMAP or Planck would have been very appropriate. In their published paper, the authors acknowledge “detailed and constructive recommendations” from two anonymous referees.
− In cases where peer review has failed we can thank post-publication peer review for exposing the problems. By publicizing before peer review, authors are effectively inviting an open, informal refereeing process to run in parallel with the journal’s peer review. The BICEP2 authors effectively acknowledge this in the “Note added” section near the end of their published paper. An open process like this gives scrutiny from the greatest possible number of experts before the paper is published. Such an approach makes a lot of sense in giving the best paper, assuming that the authors seriously consider the comments they receive, as the BICEP2 authors appear to have done. (As an aside, open peer review arguably should include a public record of comments and responses, as suggested, for example, by planners of the Open Journal for Astrophysics. However, this journal is still in the testing phase.) 
Similarly on the publicity side much of the potential benefit of open peer review depends on how the team respond to external comments and criticism. For example, if necessary, will they put out a new release or a correction explaining any changes to their original publicity claims, especially when more data becomes available?
− By placing the paper on the arXiv and publicizing it at an early stage, there is an opportunity for outsiders to witness some of the scientific process in action, as noted by Dennis Overbye. In the case of BICEP2 the skepticism of experts was quickly revealed in comments given in the press, as reported by Joel Achenbach. The paper triggered a flurry of activity, with 421 papers citing the original paper at the time of writing most of them theory papers where the result was assumed to be correct. At the same time, the detailed observational results were closely examined and criticized as noted by Richard Easther, Alan Duffy, Heino Falcke and others, resulting in some important revisions.

− By doing publicity before publication, results can be released to the public and to other scientists earlier than they otherwise would have been, since the authors do not have to wait for the refereeing process. In fields like medical science, delays can potentially be life-threatening. In astrophysics there is less practical need for haste, but long delays can be frustrating.
The BICEP2 telescope in the foreground and the South Pole Telescope in the background. Credit: Steffen Richter (Harvard University).

What are some of the disadvantages of doing publicity before a paper is accepted to a journal?:
− The most important disadvantage: there is a chance a very good referee or referees will be found and the paper will be improved. It is possible that just the clarity of description or the references will be improved, but it is also possible that significant problems in analysis or interpretation will be found. For example, maybe the referee will be the world’s leading expert about crucial but problematic details of the analysis. This is a conservative approach to publicity, adopting the attitude that some independent checking is better than none.
In the case of BICEP2, their use of Planck data from a conference talk was problematic and they may not have properly accounted for all of the foreground emission, as explained in this paper submitted in late May by Raphael Flauger, Colin Hill and David Spergel. The Flauger et al. paper’s abstract finishes by saying:
“These results suggest that BICEP1 and BICEP2 data alone cannot distinguish between foregrounds and a primordial gravitational wave signal, and that future Keck Array observations at 100 GHz and Planck observations at higher frequencies will be crucial to determine whether the signal is of primordial origin.“ 
In the published version of their paper, the BICEP2 authors, to their credit, have added a number of important caveats including this sentence added to the abstract: “However, these models are not sufficiently constrained by external public data to exclude the possibility of dust emission bright enough to explain the entire excess signal”. They also added this statement near the end of the paper: “More data are clearly required to resolve the situation.” If they had received some of this feedback about foreground emission from a referee or colleague before publicity, their press conference claims may have been made with less confidence. 
− For academics, whether they like it or not, publishing papers in journals is still (*) one of the main arbiters of academic success. A successful paper isn't achieved until publication is complete and publication isn't complete until peer review is finished. So, by publicizing before peer review is finished you can give the appearance of adopting one standard for your scientific colleagues and a different, lower one for everyone else. 
− With publicity, especially a press conference, you can reach a bigger or a much bigger audience than you would normally reach without publicity.  The audience is the tax-paying public, who fund a large amount of research. So, as a matter of responsibility, standards of review should not be significantly lowered even if there are time pressures, such as fears of leaks or concern about being scooped by competitors. 
My opinion is that the cons outweigh the pros in doing early publicity and that it is better for publicity to occur after peer review (this is approach #2 in the article by Heino Falcke). This approach shows that some effort has been taken by the authors to seek independent review of their work before publicity. It does not guarantee that the paper is flawless, but it does offer the chance to detect problems that were overlooked or not fully appreciated by the authors. For BICEP2, the changes made to the published version of their paper show that important improvements have indeed been made.

Others have questioned the timing of the BICEP2 publicity, including Bill Jones, a Princeton professor. Joel Achenbach’s excellent May 16th article in the Washington Post reports:

"Jones questioned the decision by the BICEP2 team to announce the discovery in a news conference prior to publication of a peer-reviewed paper. Kovac and Bock defended the news conference as a common procedure." 

That may be true for physics press conferences, like ones held at the LHC, but not for NASA ones, where an accepted paper is required.

Traditional peer review isn’t magical and it is easy to imagine very rigorous, informal reviews that completely bypass the journal’s process. However, it’s not clear that the BICEP2 team encouraged such independent reviews before publicity, as in the submitted, publicized version of their paper they didn’t acknowledge receiving comments from colleagues.

The BICEP2 publicity was obviously very effective at generating press coverage. If the paper had been put on the arXiv before publicity, to generate widespread comments from colleagues, then some enterprising reporters may have picked up the article and written about it before a press release was produced. It’s possible that early submission to the arXiv was considered, but there was concern that early leaking of results to the press would have a negative effect on press coverage from a later release. How could the team combat this problem but also generate reasonably widespread reviews from colleagues? One option: they could have waited for peer review to be completed before doing publicity and also sent the paper to selected colleagues by email when it was submitted, with a strong request to keep the paper private.

If publicity is performed before publication, I think it’s best to put out a conservatively-worded press release, allowing for the possibility of errors, especially if the result is significant (this is approach #3 listed by Heino Falcke). However, the announcement by the BICEP2 team was confidently worded and did not mention the need for more data, unlike the wording used in the published version of their paper.

I have no doubt that the members of the BICEP2 team are excellent researchers at the top of their game, but the claim they made is extraordinary, and we all know the saying by Carl Sagan about what level of evidence is required for such a claim. It takes time and a lot of cross-checking to accumulate extraordinary evidence, and that’s what we should try to present to the public. There now appears to be a mismatch between the ebullient publicity of March 17th and the tempered claims in the paper published on June 19th. It’s important to think about whether this mismatch could reasonably have been avoided.

(*) Please see this blog post  at Scientific American by Bonnie Swoger discussing whether the scientific journal has a future.

Friday, March 14, 2014

What Makes an Astronomical Image Beautiful?

(Note: this blog post was first published at the Chandra X-ray Observatory blog)

Astronomy is renowned for the beautiful images it produces. It's not hard to be impressed by an image like the Pillars of Creation or the Bullet Cluster, and the more eye-catching an image is, the bigger an audience it can potentially reach. So, as part of our job in astronomy outreach, we have each spent time thinking about what makes an astronomy image beautiful. As professionals, we’d like to go well beyond the intuition of the person who says, "I don't know anything about art, but I know what I like". One approach1 is to list the key elements that make an image beautiful.

Two famous images, the Pillars of Creation from the Hubble Space Telescope on the left and the Bullet Cluster from the Chandra X-ray Observatory, Hubble and ground-based observatories on the right. Credit: left: NASA, ESA, STScI, J. Hester and P. Scowen (Arizona State University); right: X-ray: NASA/CXC/CfA/M.Markevitch et al.; Optical: NASA/STScI; Magellan/U.Arizona/D.Clowe et al.; Lensing Map: NASA/STScI; ESO WFI; Magellan/U.Arizona/D.Clowe et al.

Lars Lindberg Christensen and collaborators have done just that, in an excellent paper in the most recent issue of the open journal "Communicating Astronomy with the Public". After years of experience working with images from the Hubble Space Telescope, they have defined a set of 6 criteria that are important in determining the appeal of an astronomical image. These 6 criteria are photogenic resolution (equivalent to the number of stars which can fit side by side across an image), definition (the amount of structure or contrast in an image), color, composition (how the object or objects of interest fill the field of view), signal-to-noise ratio, and how well instrumental artifacts have been removed.

Here, I'll highlight a few of these criteria and how they relate to the images we make with Chandra data. There are some key differences between the optical or infrared data obtained with telescopes like Hubble or the Spitzer Space Telescope, and the X-ray data obtained with Chandra.

For the photogenic resolution they define a quantity rphoto that is the number of effective resolution elements across the field of view (FOV). The equation is rphoto=FOV/θeffective, where θeffective is the effective angular resolution. A higher rphoto results in a better quality image. So one tactic is to push for a very large FOV by making a mosaic of a large number of adjacent images, as some amateur astronomers do, or as Chandra users did to make a large mosaic of the Carina Nebula. The other option is to use data from a telescope with a very small θeffective, where Hubble is unsurpassed at optical wavelengths and Chandra is unsurpassed at X-ray wavelengths.

A mosaic of Chandra images of the Carina Nebula. Credit: NASA/CXC/PSU/L.Townsley et al.

The authors point out that the domain with high values of the photogenic resolution - between 1000 and 10,000 - was dominated for many years by Hubble, but that more recently other observatories such as Chandra, the MPG/ESO 2.2-meter telescope, the Canada France Hawaii Telescope and ESO's VISTA and VST telescopes have joined Hubble. We're happy to have been included in this elite group.

To make a color image using optical or infrared data, observations have to be made using different filters chosen in advance, such as B (blue), V (visual) and R (red). A color is then assigned to the image obtained with each filter – in this case blue, green and red are the obvious choices – and the images are combined to make a color image. With Chandra, the energy (or wavelength) of individual photons is recorded, so different wavelength ranges can be chosen afterwards, giving us extra flexibility in making an image. By picking out different wavelength ranges the same dataset can be used to show different features, so we can experiment to see what makes the most striking image, or the most useful one to explain a particular science result. The greatest flexibility with choosing different wavelength ranges comes when the signal-to-noise ratio is high.

This leads me to describe the main challenge for producing beautiful Chandra images: sometimes the signal-to-noise ratio isn’t high. X-ray photons trace energetic events, such as regions close to a black hole, or the exploded guts of a massive star, but they tend to arrive from the cosmos in a trickle, rather than a flood. This limitation was most apparent early in the mission, when lots of different targets were observed and Chandra's observations usually involved short exposures. Later in the mission much deeper observations have been done, giving much higher signal-to-noise ratios and better images. For example, you can see the dramatic difference between these two images of the supernova remnant Cassiopeia A. The early Chandra image shown on the left had an exposure time of only 2 hours and doesn't look nearly as photogenic as a later image shown on the right, with an exposure time of 11 1/2 days.

Images of the supernova remnant Cassiopeia A. A comparison is shown between an early, short exposure (2 hr) Chandra image (left) and a later, deeper exposure (11.5 days; right). Credit: left: NASA/CXC/SAO/Rutgers/J.Hughes; right: NASA/CXC/MIT/UMass Amherst/M.D.Stage et al.

The signal-to-noise ratio is connected to the 2nd criterion, the definition in an image. If the signal-to-noise ratio is low because few counts have been detected, then this can seriously limit the amount of detailed structure that you can see in an image. Think about how much detail you can see in a painting that is well lit, compared to looking at one in the dark. Again, deep exposures help a lot. The deep observation of Cas A has significantly sharper features and more complicated structure than the shallow one.

Only a limited number of very deep observations can be made with Chandra each year. This means there is intense competition between astronomers to convince the members of the Chandra Time Assignment Committee to approve any observing proposals requiring a lot of observing time. Therefore, the science case has to be particularly strong, which becomes an advantage for us, because it means that we can publicize interesting science at the same time as showing off beautiful new images.

When we have only low signal-to-noise X-ray data to work with, we sometimes combine it with optical or infrared images, to capitalize on the high signal-to-noise in these other wavelengths. This can give a striking image, such as in this view of NGC 602.

A composite image of the star-forming region NGC 602, with Chandra X-ray data shown in purple, Hubble optical data shown in red, green and blue, and Spitzer infrared data shown in red. Credit: X-ray: NASA/CXC/Univ.Potsdam/L.Oskinova et al; Optical: NASA/STScI; Infrared: NASA/JPL-Caltech

The final criterion is instrumental artifact removal, where the relatively low count rates for Chandra are an advantage. When you detect a lot of very bright objects you tend to accumulate a lot of artifacts, so optical observations can require a lot of clean-up work. According to Christensen et al., one to two hundred hours can be spent manually cleaning a large image. Chandra images aren't free of artifacts, but they're not as much of a problem.

In their conclusion, Christensen et al. explain that the ideal case is for all six criteria to be fulfilled, giving a great image. It’s still possible to produce a great image with less, but it becomes more difficult and compromises have to be made, as they note.

Having explained some of the factors that help us produce beautiful Chandra images, we invite you to explore our photo album of images or our 3D image wall.

1: In this blog post I’ve discussed only one approach for thinking about how beautiful an image is. There are other approaches that consider aesthetics in general. My colleague Kimberley Arcand is involved in a project called “Aesthetics and Astronomy” which studies “the perception of multi-wavelength astronomical imagery and the effects of the scientific and artistic choices in processing astronomical data.”