With the increased number of developments and advances in our current scientific industries, along with a large number of students choosing STEM subjects as career options, the number of scientific papers generated has skyrocketed. Each paper published has its unique take on a problem and its detailed research and hypothesis. This makes it a gargantuan task for the scientific community to verify and vet the proposed paper. Data and statistics show that many scientists, every year, grieve due to the rejection of their papers because of ‘half-hearted’ or ‘incomplete’ reviewing.
A group of scientists- Weizhe Yuan, Pengfei Liu, and Graham Neubig from the Carnegie Mellon University in Pittsburgh, Pennsylvania came up with the eccentric idea to automate the process of reviewing the proposed scientific papers using artificial intelligence and machine learning. This model would go through every paper submitted and bring out a gist of what the paper was about and a brief review of its contents. This model would also classify the papers based on their credibility and their comprehensiveness.
The team at Carnegie Mellon first approached this very tasking challenge by setting a few standards. They went through a vast number of reviews by international review systems like the ICML, NeurIPS, and ICLR and picked out the features of a well-written scientific paper. They came up with the following standards:
- Decisiveness: A scientific paper should pick a clear stance over the course of its research and should clearly portray its basis.
- Comprehensiveness: The paper should be detailed, well-oriented and should start with a summary of the paper and its contributions to the community.
- Justification: The paper should present legitimate evidence and conclusion supporting its research from every aspect.
- Accuracy: Any scientific statement presented in the paper must be factually accurate, and any fallacy leaves an ample margin for error.
- Kindness: The paper must be written in an amicable language and must be easy to read.
After setting these standards, the team then collected a data set which was named ASAP review (Aspect-Enhanced Peer Review), where they went through machine learning papers from ICLR and NeurIPS between the years 2016-2020. After setting up the system, the team proposed that the scientific paper summarization could be aspect-based.
Following the review guidelines set up by ACL (Association of Computational Linguistics), the team identified eight aspects under which the papers will be reviewed, a matrix which will be fed into the system for better and efficient reviewing. The eight aspects are as follows:
- Summary
- Motivation or Impact
- Originality
- Soundness (Accuracy)
- Substance
- Replicability
- Meaningful comparison
- Clarity
After setting the standards and fixing the judgment aspects, the team used a pre-trained sequencing model called BART. To identify the potential biases and discrepancies that come with reviewing, the researchers defined a basic aspect score with respect to which the occurrence of the required positive aspects in the paper was calculated.
Post the setup of the systems, the paper by the Carnegie Mellon team itself was submitted for review through this automated process, and the following excerpt was generated by the model:
“This paper presents an approach to evaluate the quality of reviews generated by an automatic summarization system for scientific papers . The authors build a dataset of reviews , named ASAP-Review1 , from machine learning domain , and make fine-grained annotations of aspect information for each review , which provides the possibility for a richer evaluation of generated reviews . They train a summarization model to generate reviews from scientific papers , and evaluate the output according to our evaluation metrics described above .”
The conclusion stated that the system-generated review is relatively comprehensive and is able to summarize main ideas, although at its current state it cannot fully replace manual reviews yet. The generated review produces some incorrect assumptions, although despite this lack, it also references key statements from the paper, making it easier for the reviewer to spot critical information in the paper very easily..
The downside of this model, though, is extremely jarring. The team themselves have admitted to the complex nature of analyzing the merit and intricacies of scientific contributions, and an automated system of reviewing is nowhere close to the surety of a human reviewer. However, this system can, to great lengths, aid the reviewers to sift through the many papers that have been submitted. Therefore, the authors suggest that their developed system could be already used as a tool in a machine-assisted review process. The members of this research team are confident that the tools, numbers and statistics, and scientific models presented in the paper will go a long way in automating the review process.
Source: Weizhe Yuan, Pengfei Liu, Graham Neubig “Can We Automate Scientific Reviewing?”. arXiv.org pre-print, 2102.00176 (2021).