New research from MIT and elsewhere is making an AI that can read scientific papers and generate a plain-English summary of one or two sentences.

Scientific citation.

Image credits Mike Thelwall, Stefanie Haustein, Vincent Larivière, Cassidy R. Sugimoto (paper). Finn Årup Nielsen (screenshot).

A big part of our job here at ZME Science is to trawl through scientific journals for papers that look particularly interesting or impactful. They’re written in dense, technical jargon, which we then take and present in a (we hope) pleasant and easy to follow way that anybody can understand, regardless of their educational background.

MIT researchers are either looking to make my job easier or get me unemployed, I’m not sure exactly sure which yet. A novel neural network they developed, along with other computer researchers, journalists, and editors, can read scientific papers and render a short, plain-English summary.


“We have been doing various kinds of work in AI for a few years now,” says Marin Soljačić, a professor of physics at MIT and co-author of the research.

“We use AI to help with our research, basically to do physics better. And as we got to be more familiar with AI, we would notice that every once in a while there is an opportunity to add to the field of AI because of something that we know from physics — a certain mathematical construct or a certain law in physics. We noticed that hey, if we use that, it could actually help with this or that particular AI algorithm.”

It’s far from perfect at what it does right now — in fact, the neural network’s abilities are quite limited. Even so, it could prove to be a powerful resource in helping editors, writers, and scientists scan a large number of studies for a quick idea of their contents. The system could also find applications in a variety of other areas besides language processing one day, including machine translation and speech recognition.

The team didn’t set out to create the AI for the purpose described in this paper. In fact, they were working to create new AI-based approaches to tackle physics problems. During development, however, the team realized the approach they were working on could be used to solve other computational problems — such as language processing — much more efficiently than existing neural network systems.

“We can’t say this is useful for all of AI, but there are instances where we can use an insight from physics to improve on a given AI algorithm,” Soljačić adds.

Neural networks generally attempt to mimic the way our brains learn new information. The computer is fed many different examples of a particular object or concept to help it ‘learn’ what the key, underlying patterns of that element are. This makes neural networks our best digital tool for pattern recognition, for example identifying objects in photographs. However, they don’t do nearly so well when it comes to correlating information from hefty items of data, such as a research paper.

Various tricks have been used to improve their capability in that latter area, including techniques known as long short-term memory (LSTM) and gated recurrent units (GRU). All in all, however, classical neural networks are still ill-equipped for any sort of real natural-language processing, the authors say.

So, what they did was to base their neural network on mathematical vectors, instead of on the multiplication of matrices (which is classical neural-network approach). This is very deep math territory but, essentially, the system represents each word in the text by a vector — lines of a certain length, orientation, and direction — created and altered in a multidimensional space. Encyclopaedia Britannica defines “vectors, in mathematics, as quantities that have both magnitude and direction but not position,” listing velocity and acceleration as examples.

Subscribe to our newsletter and receive our new book for FREE
Join 50,000+ subscribers vaccinated against pseudoscience
Download NOW
By subscribing you agree to our Privacy Policy. Give it a try, you can unsubscribe anytime.

The network used each vector subsequently, as words were being read, to modify a starting vector. The final vector or set of vectors is then translated back into a string of words. The name the team gave this approach, thankfully, is much easier to wrap your head around: RUM (rotational unit of memory).

“RUM helps neural networks to do two things very well,” says Preslav Nakov, a senior scientist at the Qatar Computing Research Institute and paper co-author. “It helps them to remember better, and it enables them to recall information more accurately.”

RUM was developed to help physicists study phenomena such as the behavior of light in complex engineered materials, the team explains. However, the team soon realized that “one of the places where […] this approach could be useful would be natural language processing.”

Artificial summaries

Soljačić, says he recalls a conversation with Mićo Tatalović, a former Knight Science Journalism fellow at MIT, a former editor at New Scientist magazine, and co-author of the study, who said that such a tool would be useful for his work as an editor trying to decide which papers to write about. Tatalović was, at the time, exploring AI in science journalism as his Knight fellowship project.

“And so we tried a few natural language processing tasks on it,” Soljačić says. “One that we tried was summarizing articles, and that seems to be working quite well.”

As a proof-of-concept, the team ran the same research paper through a conventional (LSTM-based) neural network and through their RUM-based system, asking them to produce short summaries. The end results were dramatically different. RUM can read through an entire research paper, not just it’s abstract, and summarise its content. The team even ran the present study through RUM (they were probably just showing off at this point).

Here’s the summary produced by the LSTM system:

‘”Baylisascariasis,” kills mice, has endangered the allegheny woodrat and has caused disease like blindness or severe consequences. This infection, termed “baylisascariasis,” kills mice, has endangered the allegheny woodrat and has caused disease like blindness or severe consequences. This infection, termed “baylisascariasis,” kills mice, has endangered the allegheny woodrat.”

Here’s the one the RUM system produced:

“Urban raccoons may infect people more than previously assumed. 7 percent of surveyed individuals tested positive for raccoon roundworm antibodies. Over 90 percent of raccoons in Santa Barbara play host to this parasite.”

Here’s the neural network’s summary of the study we’re discussing:

“Researchers have developed a new representation process on the rotational unit of RUM, a recurrent memory that can be used to solve a broad spectrum of the neural revolution in natural language processing.”

You guys like my coverage better, though, right? Right…?

The paper “Rotational Unit of Memory: A Novel Representation Unit for RNNs with Scalable Applications” has been published in the journal Transactions of the Association for Computational Linguistics.