Star Wars: Unpacking the Publication Bias

Roughly one year ago, I wrote a blog article suggesting a recent publication found different empirical methods expose the different extent of p-hacking and, thus, toward different degrees of publication bias. However, a more critical question for economics researchers has not been explored in that study: Would editors and reviewers have strong preferences for estimation results with stars? To understand the motivation and the extent and sources of p-hacking for the published papers, we need to unpack the editorial process, especially the roles of authors, editors, and reviewers in the publication process. 

To facilitate non-academic readers’ understanding of the peer-review process, here is a brief timeline on scientific journal submission. First, the corresponding author(s) submit the paper to a journal. Then, an editor will be assigned to handle the paper. Suppose he or she believes the paper is outside the journal’s scope or the contribution is too minor. In that case, the handling editor will reject the paper without inviting the referee for further reviews – a type of decision called desk rejection. Otherwise, the editor will send the manuscript to potential reviewers for either a single-blind or double-blind review. The reviewers will then submit their comments and recommendation to the journal. The handling editor will consider the reviews and decide to accept or reject the submission – or with a request for revision, (usually flagged as either major or minor) before it is reconsidered. 

Obviously, the above peer-reviewing process, in particular the role of the editor, will significantly influence the final publication decision. Presumably, with enough submissions, the underlying distribution of the test statistics of all submitted manuscripts should be close to normal with a single hump. In that sense, an exciting question about tracing the source of publication bias appears. If we observe the distribution of test statistics among originally submitted papers is more skewed than that of the final published ones, the peer review process will mitigate p-hacking. Conversely, if the distribution of test statistics is more normal among journal submissions than among publications, the peer review process will contribute to p-hacking and publication bias. A recent study by Brodeur et al. (2022; from now on BCFL) examines the above conjecture and sheds some light on it by exploring the editorial process impacts and unpacking the publication bias.

Figure 1: Distribution of z-statistics for initial submissions

Source: Brodeur et al. (2022)

BCFL collected 11,000 test statistics from nearly 400 manuscripts submitted to a top-field journal in applied microeconomics, the Journal of Human Resources, from 2013 to 2018. Figure 1 shows the raw distribution of the test statistics for the entire sample of initial submissions. The distribution presents a precise two-humped shape, with the second hump being between one star (1.65) and two stars (1.96), proving that p-hacking may exist. Then, BCFL plots the distributions of desk rejections and not desk rejected ones, as shown in Figure 2. Interestingly, comparing the right figure (not desk rejected) with the left figure (desk rejections), editors seemingly, on average, “filter out” the false positive and attenuate the extent of p-hacking – papers sent for review display significantly less bunching than those that were desk rejected.

Figure 2: Distribution of z-statistics for desk rejected (left) and not desk rejected (right) manuscripts

Source: Brodeur et al. (2022)

Next, BCFL turns to the manuscripts that are sent out for anonymous reviews. The authors collect all referee reports and plot the distribution of the test statistics based on the categorized recommendations. Figure 3 shows their results by reviewer recommendations. While the left figure (manuscripts recommended for rejection by referees) shows a roughly monotonically decreasing distribution except for a very small local maximum of around two stars, the second hump is quite evident for the right figure (manuscripts recommended for acceptance or with minor revisions). This suggests reviewers tend to have a positive bias toward statistically significant results as opposed to the “filter out” effect from the editors.

Figure 3: Distribution of z-statistics by reviewer recommendation (left: recommended rejection, right: recommend accept as is or with only minor edits)

Source: Brodeur et al. (2022)

Lastly, to evaluate the overall peer-review process impact on publication bias, BCFL compares the same distribution of test statistics from the final accepted manuscripts against all rejections, including both desk rejections and referee rejected ones. This shows a total effect, adding up the “filter out” editor’s impact and the reviewer’s positive bias effect. Figure 4 presents the distribution for all rejected manuscripts (right figure) and the distribution of the final accepted manuscripts (left figure), respectively. The net effect of these processes displays much less bunching among the accepted manuscripts than those rejected, suggesting the editor’s de-p-hacking effect is more considerable than the anonymous reviewer’s pro-p-hacking effect.

Figure 4: Distribution of z-statistics by rejected and final draft of accepted manuscripts (left: all rejections, right: accepted manuscripts)

Source: Brodeur et al. (2022)

Researchers generally believe editors and reviewers have strong preferences over the results with stars. However, BCFL’s study alarms it is not or at least not entirely true. If that is the case, why bother with Star Wars? How about having a coffee and signing a Peace Agreement with your non-starred results?

Chaoyi Chen


Brodeur, A, S Carrell, D Figlio, and L Lusher (2022). Unpacking P-hacking and Publication Bias, Technical Report.

Főoldali kép forrása: