The number of new papers on the COVID-19 pandemic is doubling every two weeks and shows no sign of slowing. Many of these papers are published first on preprint servers, which means they are made public before having undergone peer review.
This makes it all the harder to judge their merit. Now, one start-up company says that its platform — called Scite.ai — can automatically tell readers whether papers have been supported or contradicted by later academic work.
Unlike conventional citation-metrics tools, Scite.ai tells users how often a paper has been supported or contradicted by the studies that cite it, as well as how many times it has simply been mentioned. The resulting reports display citations in the context in which they are mentioned, allowing users to assess for themselves how the paper is being cited.
So far, Scite.ai has analyzed more than 16 million full-text scientific articles from publishers such as BMJ Publishing Group in London and Karger in Basle, Switzerland.
But that is just a fraction of the scientific literature. “They’re limited by the literature they can get hold of and the machine-learning algorithms,” notes Jodi Schneider, an information scientist at the University of Illinois at Urbana–Champaign.
Still, the tool — accessible through a searchable website and as Chrome and Firefox browser plug-ins — can provide clarity.
In March, the site’s developers pointed its artificial intelligence (AI)-based engine to a database, which at the time included 30,000 papers on different kinds of coronavirus, to help provide context about how much weight each of the articles might carry (see go.nature.com/35nchkp).
They found that one 22 February preprint1, which indicated that higher levels of certain immune-signaling molecules are associated with more severe cases of COVID-19, was supported by a preprint2 from another group just five days later (see the Scite.ai report at go.nature.com/2ztuokb).
Conversely, users who search Scite.ai for a preprint that suggested HIV contributed to the formation of the new coronavirus will find that the report was contradicted by two follow-ups and supported by none (see go.nature.com/2vtdfxd).
(The preprint’s authors have since withdrawn it for revision in response to researchers’ comments on the work.) At the moment, Scite.ai’s analysis of the COVID-19 database of papers is not fully automated, so there is sometimes a delay in how quickly the preprints are analyzed by the tool.
Scite.ai gets about 1,000 visitors a day and has some 2,700 registered users, a number that is growing since the site began requiring users to register to view the full citation analysis for a given paper on 20 March.
Citation counts are conventionally seen by researchers as measures of influence. But just because a paper is highly cited doesn’t mean it’s a good thing, says Elizabeth Suelzer, a reference librarian at the Medical College of Wisconsin Libraries in Milwaukee.
Former physician Andrew Wakefield’s infamous retracted 1998 study that claimed a link between autism and vaccines is highly cited, she notes, but most of those citations are negative.
Without a thorough citation analysis, “it would be hard to tell why the article was so highly cited”, Suelzer explains. That, she says, is why a tool such as Scite.ai could be helpful. Other examples include the Retraction Watch plug-in that flags retracted articles for the Zotero reference-management software.
Josh Nicholson, co-founder, and chief executive of Scite.ai, first recognized the need for such a tool in 2012. Nicholson was pursuing a Ph.D. in cell biology at Virginia Polytechnic Institute and State University in Blacksburg when he read a Nature commentary that was making waves about scientific reproducibility3.
In it, a researcher formerly at the biotechnology company Amgen in Thousand Oaks, California, revealed that scientists there had been unable to reproduce the findings of 47 out of 53 ‘landmark’ cancer studies.
That spurred Nicholson and biologist Yuri Lazebnik, then at Yale University in New Haven, Connecticut, to propose a new citation metric to indicate whether a given study or its conclusions have been verified by subsequent reports4. The pair launched Scite.ai in April last year.
At the heart of Scite.ai is a machine-learning algorithm that scans research articles to identify which papers they cite, and to determine whether they support, contradict, or simply mention those papers.
The algorithm mines the text of articles from publisher partners, including the Rockefeller University Press in New York City and Wiley in Hoboken, New Jersey.
Scite.ai has also had preliminary conversations with Springer Nature in Heidelberg, Germany, which publishes Nature, Nicholson says. According to Nicholson, eight out of every ten papers flagged by the tool as supporting or contradicting a study are correctly categorized.
Although the machine-learning algorithm at the heart of Scite.ai has not been made public, Giovanni Colavizza, an AI scientist at the University of Amsterdam, currently a visiting researcher at the Alan Turing Institute in London, says that “their results are sound and precise”, from what he can tell.
“Most citations are classified as ‘mentions’, because the classifier is trained to be cautious, which is reasonable, too,” says Colavizza, who is a user of the platform and whose team has analyzed data from the start-up in the past.
James Heathers, a data scientist at Northeastern University in Boston, Massachusetts, likes the way that, for each paper, Scite.ai shows snippets of the other articles in which that paper’s citations appear, saving him from having to look up each referring paper and hunt for this context.
“Every time I’m exploring a complicated topic from scratch, I’m using this,” Heather says of Scite.ai. “The sentiment analysis seems to work really well,” he adds, referring to how Scite.ai categorizes positive an