Nobody likes Excel spreadsheets, admit it. Nevertheless, a lot of people find themselves having to use it every once in a while. Some use it every day, though by the grace of providence that does not include yours truly. It includes, however, most people working in science. Ask anyone working in science what they hate the most about their jobs and they’ll answer spreadsheets, second only to Powerpoint presentations. Yes, Microsoft makes beloved products. What most don’t realize is that they have one more big reason to hate Excel. According to the Cardiff Metropolitan University, Excel — or rather its poor use — might be responsible for many errors that creep through research papers.
The scientists surveyed 17 researchers from the University of Newcastle neuroscience research center, from PhDs to senior researchers. Not one single participant had received any formal training for Microsoft Excel but despite this, the vast majority reported their spreadsheet skills as ‘intermediate’. About 71% of the participants said they were ‘self-taught’ Excel users.
When asked whether they had someone ‘peer-review’ their raw spreadsheet data and results, only 20 percent answered ‘yes’. Most said they did the testing themselves or not at all.
This has prompted the authors to conclude that, at least in neuroscience, most researchers are overconfident of their Excel skills. The repercussions for science could be far reaching, albeit this is a very small study with a less than ideal sample size currently posted on the preprint server arXiv.
For example, one well documented study of how spreadsheets can go haywire is when working with genes. When left to its default settings, Excel is known to convert gene names to dates and floating-point numbers. Not a lot of people know this, as reported in a 2016 study published in Genome Biology which found “one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.” A gene named “SEPT4”, which corresponds to the gene Septin 4, is interpreted by the software as “September 4th”, for instance. The program also tended to mistake identification codes like “2310009E13” for numbers in scientific notation—in this particular instance, the code would be read as 2.310009 times 1019.
“The first is that spreadsheet errors are rare on a per-cell basis, but in large programs, at least one incorrect bottom-line value is very likely to be present. The second is that errors are extremely difficult to detect and correct. The third is that spreadsheet developers and corporations are highly overconfident in the accuracy of their spreadsheets. The disconnect between the first two conclusions and the third appears to be due to the way human cognition works. Most importantly, we are aware of very few of the errors we make. In addition, while we are proudly aware of errors that we fix, we have no idea of how many remain, but like Little Jack Horner we are impressed with our ability to ferret out errors.”
So, what to do? Obviously, being aware of how Excel parses cells and how it handles data can save you a lot of trouble. But if using Excel is boring, wait until you read the manual. Alternatively, scientists might want to learn other spreadsheet software that is better suited to their field.