Standardization vs Diversity: How Can We Push Peer Review Research Forward?

Karen Shashok

In This Article


Peer review as a means toward quality assurance has been known for some time now to have serious limitations. Although research evaluators and promotional committees still consider it a guarantee of scientific quality and relevance, they seem unaware that they may be expecting too much from peer review. Journal editors, in contrast, have become aware of its failings, but they rightly point out that it has thus far not been possible to develop an acceptable substitute. So a realistic goal right now is to look for ways to improve the process and make it more reliable -- for example, by enhancing its internal validity, its predictive power, or other indicators of the quality of the process and its outcomes.

Many of the indicators that have been tested thus far are derived from the qualitative statistical methods currently preferred in biomedical science. As noted by Overbeke and Wager[1]:

Most research on peer review in biomedical journals has been carried out by journal editors with a background in medical research. They therefore tend to use methods that have been developed for clinical trials. These methods...are widely accepted as valid ways of producing robust evidence about medicines and medical techniques, but they may be less appropriate for the complex psychosocial interactions involved in peer review.

Despite the apparent progress made since the 1980s in turning biomedical peer review research into a scientifically grounded endeavor, most "serious" studies to date have yielded inconclusive findings or only weak trends toward statistical significance that neither confirm nor refute the basic assumptions regarding the usefulness of peer review. In short, this research has not produced enough solid evidence for deciding what steps, what criteria, and what outcomes are most likely to help editors decide what to publish and what to reject. The evidence is that there is precious little scientific evidence[2] for the effectiveness of specific practices that have been investigated thus far.