Menu Close

Peer review: benefits, biases, and alternatives

How to ensure the quality and integrity of scientific research? How to select the best research projects for funding and the best articles for publication?” One of the mechanisms historically built in science to answer such questions is the peer review process, which is being investigated by the project Research and Innovation Research: Indicators, Methods, and Evidence of Impacts, funded by the State of São Paulo Research Foundation (Fapesp). Such procedures are widely used by scientific journals, funding agencies, and other selection and decision-making entities at universities and research institutions. However, peer review is constrained by several types of biases. In addition, peer review acts in the delimitation (and, therefore, in inclusion and exclusion mechanisms) of knowledge fields, subfields, and research communities, influencing the status and recognition of researchers. [1]

In the case of academic journals, peer-review assessment involves the authors of the paper, the journal’s editor, and evaluators, experts technically capable of judging the submitted work. The assessment can be double-blind, that is, neither evaluators nor authors know each other’s identity; single-blind, in which the evaluator knows the authors; and open when author and reviewer identities are known. Each mechanism has its advocates and critics: a double-blind assessment can avoid misjudgment based on the author’s identity. In contrast, open evaluations can avoid criticism since the identities are publicized. In funding agencies, depending on the funding scheme, the applicant’s identity is often revealed, as some selection criteria are related to the academic trajectory and curriculum of the researchers. In this context, the assessment generally permeates more than one decision-making body (e.g., scientific committees) and has ad hoc evaluators and the agency’s staff.

Some criticisms of the concept and practice of peer review have surfaced in the last decades, related to its inconsistency, mainly due to the frequent difference in opinions regarding the same article or project and the biases associated with this selection process. The biases related to the author’s characteristics stand out here, mainly related to gender inequalities, nationality, affiliation, language, and prestige of the author. In addition, biases can also be related to the reviewers’ profiles, given differences in culture, pattern, degree of expertise, and specialization in each knowledge field.

Finally, biases also relate to the article’s content or project being evaluated. In some cases, there is a tendency to assess more positively works that confirm the initial hypotheses of the study, as well as rejecting interdisciplinary research and riskier or daring projects which do not follow traditions or mainstream approaches, themes, and methods, evidencing a relative conservatism of evaluators. In this sense, peer review can also generate a particular risk aversion. In the case of funding agencies, the complexity of the decision-making and assessment process also entails another possibility of bias related to the agency managers’ own decisions, which may or may not follow what was recommended by the specialists. [2]

Another recurrent criticism refers to the reviewer’s work, which funding agencies and journals usually carry out on an unpaid basis, often needing adequate training or guidance.

The debate on the use of peer review in research funding agencies and its biases accompanies the very course of institutionalization of this practice. In this sense, while peer review crystallizes the autonomy of science, placing itself as the best available way to decide what to fund and what not to fund, it does not guarantee that the best decisions are made. As Mitroff and Chubin (1979) put it in a pioneering discussion on the subject, “(…) the process can be used to justify any decision” (p. 201). [3]

These authors review two studies from the 70s on peer review at the North American National Science Foundation (NSF) based on the perception of reviewers and researchers who submitted proposals to the agency. More than discussing the biases that such studies have pointed out concerning the peer review process, the authors discuss the biases of the studies themselves, which, in turn, can help to explain or even determine the adequacy of peer review processes. Therefore, they conclude that the data available at that historical moment could have been more conclusive in solving the debate.

Since then, much has been written and researched about peer review in funding agencies. Let us look at some examples.

Marsh, Jayasinghe & Bond (2008) [4] observe biases in evaluating the quality of research proposals from the Australian Research Council. The authors found a correlation of 0.44 in the assessment of two independent advisors for the same proposal, with differences still existing between the knowledge fields since the correlation was higher in the hard sciences when compared to the humanities and applied social sciences.

Wittman et al. (2019) [5], in turn, found a gender bias when analyzing applications from the Canadian Institute of Health Research. Women’s résumés consistently scored worse in peer review than men’s résumés, although there were no significant differences in the assessment of research proposals. Bornmann & Daniel (2005) [6], in addition to gender bias, identified affiliation bias when observing applications for doctoral grants in biomedicine at the Boehringer Ingelheim Fonds. Additionally, Ginther et al. (2011) [7] provided evidence that black candidates are less likely than white candidates to receive research funding from a specific National Institutes of Health (NIH) program.

In addition to the authors’ characteristics, the reviewers’ profile also impacts the decisions. An analysis of research projects in endocrinology at the US Medical School (Boudreau et al., 2016) [8] showed that reviewers consistently gave lower scores to research proposals closer to their expertise field. The smaller the intellectual distance between evaluators and those being evaluated, the more rigorous the assessment tends to be. In the same vein, Seeber et al.
(2021) [9] analyze the contribution of previous experience of reviewers in assessment as a way to increase the reliability and convergence of their decisions.

Boudreau et al. (2016) also found some risk aversion in the evaluated proposals, indicating that the very content of these proposals leads to biases in the peer-review assessment. This same discussion is brought up by Veugelers, Wang, and Stephan (2021) [10] when they identify that researchers with a history of risky research are less likely to be selected for funding, especially when they are at the beginning of their careers.

Finally, it is possible to add to the biases related to the assessment of work quality, the authors’ and reviewers’ characteristics and the work content, and those related to the discretionary power of the funding agencies. In addition to the evaluator’s recommendation, the agency’s decision is at stake and, therefore, the form of consideration of this recommendation since it is up to the agency to follow it. Ginther & Heggeness (2020) [11] present this discussion by observing recommendations and acceptances of candidates to the NIH postdoctoral program, highlighting the potential of peer review to identify the most promising researchers compared to the decisions made by the technical staff of the funding agency.

To face such biases, aiming at a more straightaway, inclusive, and transparent science, several alternatives to peer review have emerged, as well as other models for evaluating papers, projects, programs, public policies, individuals, and institutions (Bendiscioli, 2019) [12]. A systematization of 50 variations on the peer review model can be found in the study by Recio-Saucedo et al. (2022) [13]. They are alternatives developed to address specific problems and concerns in scientific publications and allocate research resources.

Based on bibliographic reviews and ad hoc consultations, the authors identified evidence of successful interventions in the peer review process and decision-making in different contexts based on the experiences of organizations in the United States, Europe, Canada, Asia, and Oceania. Of the interventions on a pilot basis or simulated by researchers, a good part aimed at improving the identification of reviewers, the selection, and correspondence of proposals with evaluators. To this end, funding agencies implemented initiatives such as

  1. The creation of a web tool that partially automated the process of selecting reviewers based on bibliometric data to determine their competence, scientific activity, and field of expertise;
  2. the use of scientific productivity indicators as an additional element in the selection process;
  3. conducting reviewer training on assessment criteria; and
  4. The involvement of the applicant himself as an evaluator of proposals from individuals who compete for the same funding scheme.

Virtual expert panels stand out among the long-term interventions that resulted in recommendations or changes in funding practices and the broader research ecosystem. The teleconferences, despite not having significantly reduced the discussion time, resulted in opinions with a quality comparable to the traditional meeting model (face-to-face), showing a high level of reliability, in addition to cost reductions.

Other variations of the traditional peer review model concern the issue of anonymity, briefly discussed earlier, in the quest to improve the reliability of peer review and achieve greater consensus and diversity in the selection of proposals. Thus, agencies can take advantage of five alternatives: (i) single-blind: the reviewer remains anonymous, but the candidate is identifiable; (ii) double-blind: both (reviewer and candidate) are kept anonymous; (iii) triple-blind: candidates, reviewers, and editorial/scientific committee are not identified; (iii) blind review: the reviewer is identified and (iv) open review: a system of open review in which the identities of the authors and reviewers are public.

For example, the Volkswagen Foundation (Germany) has used double-blind peer review to assess proposals submitted to the “Experiment!” [14], which supports researchers in science and engineering in the development of research with a high degree of risk, based on the assumption that this model will avoid biases related to the characteristics of the applicants, prioritizing the quality of the idea and not the reputation of the candidate (Horbach & Halffman, 2018) [15].

Other interventions seek to improve metrics and assessment indicators to increase the capacity of funded research to deliver scientific, economic, and social benefits. It is worth highlighting the experience of the US NSF and the Wellcome Trust (a foundation that funds biomedical research in the United Kingdom), which associated the use of traditional and well-established metrics of research performance and impact (such as citation index and the journal’s impact factor) with alternative metrics (altmetrics), capable of providing decision-makers with a more realistic scenario on the activity and impact of scientific production (Recio-Saucedo et al., 2022; Pierro, 2016 ). [16]

The practice aligns with the recommendations in the San Francisco Declaration on Research Assessment (DORA) [17]. It has influenced funding agencies, which have been adopting broader metrics to assess the performance of researchers (Marques, 2021) [18]. Most funding agencies apply two or more criteria to assess research proposals and implement different criteria depending on the program’s scope (Shailes, 2017) [19]. This is the case of the Australian Research Council (ARC), which, in supporting programs for young researchers, uses “project quality and innovation” as the main criteria, while in support lines related to university-industry-government collaboration, the emphasis is placed on the commitment of the partner organizations and the importance and innovation level of the project, with less weight on criteria regarding the researchers themselves.

Other interventions propose alternative or complementary models to peer review, based mainly on the random distribution of research resources. Private foundations and public research support agencies in countries such as New Zealand, Austria, Germany, Switzerland, and Denmark already adopt this type of system to select projects in specific funding modalities (FAPESP, 2022; [20] Adam, 2019) [21]

In different countries, agencies, programs, and knowledge fields, evidence of biases in peer review and the benefits of new practices in using peer review is generated. However, without a systematic review of the subject, generalizations are not possible. Understanding these deviations and solutions must be done in the context of each experience and each historical moment. More than four decades after the work of Mitroff and Chubin (1979), we still have a nonconclusive debate, which oscillates between the conviction that peer review is one of the best alternatives, if not the best, to support the prioritization investments in research, and the conviction that this practice is not exempt from numerous biases that can and should be discussed.


Authors

Evandro Coggo Cristofoletti is a researcher at the Department of Science and Technology Policy (DPCT) at Unicamp.

Ana Carolina Spatti is a postdoctoral fellow at the School of Applied Sciences (SAS) at Unicamp.

Adriana Bin is a professor at the School of Applied Sciences (SAS) at Unicamp.


References

[1] Zuckerman, H., & Merton, R. K. (1971). Patterns of evaluation in science: Institutionalisation, structure and functions of the referee system. Minerva, 66-100.

[2] Bornmann, L. (2011). Scientific peer review. Annual review of information science and technology, 45(1), 197-245. Tennant, J. P., & Ross-Hellauer, T. (2020). The limitations to our understanding of peer review. Research integrity and peer review, 5(1), 1-14.

[3] Mitroff, I.I., & Chubin, D.E. (1979). Peer review at the NSF: A dialectical policy analysis. Social Studies of Science, 9(2), 199-232.

[4] Marsh, H.W., Jayasinghe, U.W., & Bond, N.W. (2008). Improving the peer-review process for grant applications: reliability, validity, bias, and generalizability. American psychologist, 63(3), 160.

[5] Wittman, H. O., Hendricks, M., Straus, S., & Tannenbaum, C. (2019). Are gender gaps due to evaluations of the applicant or the science? A natural experiment at a national funding agency. The Lancet, 393(10171), 531-540.

[6] Bornmann, L., Daniel, H.D. (2005). Selection of research fellowship recipients by committee peer review. Reliability, fairness and predictive validity of Board of Trustees’ decisions. Scientometrics, 63(2), 297-320.

[7] Ginther, D.K., Schaffer, W.T., Schnell, J., Masimore, B., Liu, F., Haak, L.L., & Kington, R. (2011). Race, ethnicity, and NIH research awards. Science, 333(6045), 1015-1019.

[8] Boudreau K.J., Guinan, E.C., Lakhani, K.R., Riedl, C. (2016) Looking Across and Looking Beyond the Knowledge Frontier: Intellectual Distance, Novelty, and Resource Allocation in Science. Manage Sci., 62(10):2765-2783.

[9] Seeber, M., Vlegels, J., Reimink, E., Marušić, A., Pina, D. G. (2021). Does reviewing experience reduce disagreement in proposals evaluation? Insights from Marie Skłodowska-Curie and COST Actions. Research Evaluation, 30(3), 349-360.

[10] Veugelers, R., Stephan, P., Wang, J. (2021). Excess Risk-Aversion at ERC. Working Paper, KULeuven.

[11] Ginther, D.K., Heggeness, M.L. (2020). Administrative discretion in scientific funding: Evidence from a prestigious postdoctoral training program. Research policy, 49(4), 103953.

[12] Bendiscioli, S. (2019). The troubles with peer review for allocating research funding: Funders need to experiment with versions of peer review and decision‐making. EMBO reports, 20(12), e49472.

[13] Recio-Saucedo, A., Crane, K., Meadmore, K. et al. (2022). What works for peer review and decision-making in research funding: a realist synthesis. Research Integrity and Peer Review, 7(1).

14] https://www.volkswagenstiftung.de/en/funding/our-funding-portfolio-at-a-glance/experiment

[15] Horbach, S.P., & Halffman, W. (2018). The changing forms and expectations of peer review. Research integrity and peer review, 3(1), 1-15.

[16] Pierro, B. (2016). Impact beyond academia. FAPESP Magazine, available at: https://revistapesquisa.fapesp.br/impacto-alem-da-academia.

[17] phttps://sfdora.org

[18] Marques, F. (2021). Revised standards for assessing quality FAPESP Magazine, available at: https://revistapesquisa.fapesp.br/novas-reguas-para-medir-a-qualidade.

[19] Shailes, S. (2017). Peer Review: To fund or not to fund? eLife 6:e32015.

[20] FAPESP. (2022). A model that seeks to enhance the selection of projects through randomized drawing. FAPESP Magazine, available at: https://revistapesquisa.fapesp.br/modelo-busca-dar-mais-seguranca-a-selecao-de-projetos-por-sorteio.

[21] Adam, D. (2019). Science funders gamble on grant lotteries. Nature, 575(7784), 574-575.

Posted in Scientific Dissemination Articles