A team of researchers from the U.S. and Hong Kong is working to develop new methods of statistical analysis that may let us predict the risks of very rare but dramatic events such as pandemics, earthquakes, or meteorite strikes happening in the future.

Our lives over these last two years have been profoundly marked by the pandemic — and although researchers warned us about the risk of a pandemic, society was very much surprised. But what if we could statistically predict the risk of such an event happening in advance?

An international team of researchers is working towards that exact goal by developing a whole new way to perform statistical analyses. Typically, events of such rarity are very hard to study through the prism of statistical methods, as they simply happen too rarely to yield reliable conclusions.

The method is in its early stages and, as such, hasn’t proven itself. But the team is confident that their work can help policymakers better prepare for world-spanning, dramatic events in the future.

### Black swans

“Though they are by definition rare, such events do occur and they matter; we hope this is a useful set of tools to understand and calculate these risks better,” said mathematical biologist Joel Cohen, a professor at the Rockefeller University and at the Earth Institute of Columbia University, and a co-author of the study describing the findings.

The team hopes that their work will give statisticians an effective tool with which to analyze sets of data when it contains very sparse points of data, as is the case for very dramatic (positive or negative) events. This, they argue, would give government officials and other decision-makers a way to make informed decisions when planning for such events in the future.

Statistics by now is a tried and true field of mathematics. It’s one of our best tools when trying to make sense of the world around us and, generally, serves us well. However, the quality of the conclusions statistics can draw from a dataset relies directly on how rich those datasets are, and the quality of the information they contain. As such, statistics has a very hard time dealing with events that are exceedingly rare.

That hasn’t stopped statisticians from trying to apply their methods to rare-but-extreme events, however, over the last century or so. It’s still a relatively new field of research in the grand scheme of things, so we’re still learning what works here and what doesn’t. Where a worker would need to use the appropriate tool for the job at hand, statisticians need to apply the right calculation method on their dataset; which method they employ has a direct impact on which conclusions they draw, and how reliably these reflect reality.

Two important parameters when processing a dataset are the average value and the variance. You’re already familiar with what an average value is. The variance, however, shows how far apart the values that make up that average are. For example, both 0 and 100, as well as 49 and 51, average out to 50; the first set, however, has a much larger variance than the latter.

For typical sets, the average value and the variance can both be defined by finite numbers. In the case of the events that made the object of this study, however, the sheer rarity with which they take place can push these numbers towards ridiculous values bordering on infinity. World wars, for example, have been extremely rare events in human history, but each one has also had an incredibly large effect, shaping the world into what it is today.

“There’s a category where large events happen very rarely, but often enough to drive the average and/or the variance towards infinity,” said Cohen.

Such datasets require new tools to be properly handled, the team argues. If we can make heads and tails of it, however, we could be much better prepared for them, and see a greater return on investments into preparedness. Governments and other ruling bodies would obviously stand to benefit from having such information on hand.

Being able to accurately predict the risk of dramatic events would also benefit us as individuals, and provide important tangible benefits in society. From allowing us better plan out our lives (who here wouldn’t have liked to know that the pandemic was going to happen in advance?), to better preparing for threatening events, to giving us arguments for lower insurance premiums, such information would definitely be useful to have. If nothing bad is likely to happen during our lifetimes, you could argue, wouldn’t it make sense for my life insurance policy premiums to be lower? The insurance industry in the US alone is worth over $1 trillion and making the system more efficient could amount to major savings.

### But does it work?

The authors started from mathematical models used to calculate risk and examined whether they can be adapted to analyze low-probability, very high-impact events with infinite mean and variance. The standard approach these methods use involves semi-variances: the practice of separating the dataset in ‘below-average’ and ‘above-average’ halves, then examining the risk in each. Still, this didn’t provide reliable data.

What does work, the authors explain, is to examine the log (logarithmic function) of the average to the log of the semi-variance in each half of the dataset. Logarithmic functions are the reverse of exponentials, just like division is the reverse of multiplication. They’re a very powerful tool when you’re dealing with massive, long numbers, as they simplify the picture without cutting out any meaningful data — ideal for studying the kind of numbers produced by rare events.

“Without the logs, you get less useful information,” Cohen said. “But with the logs, the limiting behavior for large samples of data gives you information about the shape of the underlying distribution, which is very useful.”

While this study isn’t the end-all-be-all of the topic, it does provide a strong foundation for other researchers to build upon. For now, although new and in their infancy, the findings do hold promise. Right now, they’re the closest we’ve gotten to a formula that can predict when something big is going to happen.

“We think there are practical applications for financial mathematics, for agricultural economics, and potentially even epidemics, but since it’s so new, we’re not even sure what the most useful areas might be,” Cohen said. “We just opened up this world. It’s just at the beginning.”

The paper Taylor’s law of fluctuation scaling for semivariances and higher moments of heavy-tailed data” has been published in the journal *Proceedings of the National Academy of Sciences*.