homehome Home chatchat Notifications


Anonymizing smartphone data is no longer enough -- users can be identified with just a few details

There are solutions to anonymize data, but they need to be carefully implemented.

Mihai Andrei
January 27, 2022 @ 1:12 am

share Share

Vast amounts of data from users are available to smartphone companies. Companies ensure us that this data is anonymized — devoid of personal indicators that could pinpoint individual users. But these insurances are hollow, a new study claims: a skilled attacker can identify individuals in anonymous datasets.

Image credits: Olia Nayda.

When the pandemic started and lockdowns were enforced, the world seemed to grind to a halt. You could see that easily just by looking around, but the data also confirmed it. For instance, mobility trends published by the likes of Apple and Google showed that a significant part of the population had stopped commuting to work, and people were increasingly using more cars and less public transit.

At first, users were understandably spooked by the data. Do tech companies know where I go and what I do? That’s not how it goes, the companies assured us. The data is anonymized — they know a user went somewhere and did something, but they don’t know who that user is. Other apps also scoop vast quantities of data from your smartphone, either for ad targeting or for other purposes, though in many cases, they are still legally mandated to make the data anonymized, removing all identifiable bits like names and phone numbers.

But that’s no longer enough. With just a few details (like for instance, how they communicate with an app like WhatsApp), researchers were able to identify many users from anonymized data. Yves-Alexandre de Montjoye, associate professor at Imperial College London and one of the study authors, told AFP it’s time to “reinvent what anonymisation means”.

What is anonymous?

The researchers started by looking at anonymized data from around 40,000 smartphone users, mostly gathered from messaging apps. They then “attacked” the data — mimicking a process a malicious actor would do. Essentially, this involved searching for patterns in the data to see whether it could be figured out who individual users are.

With only the direct contacts included in the dataset, they were able to pinpoint individual users 15% of the time. When, in addition, further interactions between those primary contacts were included, they were able to identify 52% of the users.

This doesn’t mean that we should give up on anonymization, the researchers explain. However, we should strengthen what this anonymization means, making sure that the data is indeed anonymous.

“Our results provide evidence that disconnected and even re-pseudonymised interaction data remain identifiable even across long periods of time,” the researchers wrote. “These results strongly suggest that current practices may not satisfy the anonymisation standard set forth by (European regulators) in particular with regard to the linkability criteria.”

“Our results provide strong evidence that disconnected and even re-pseudonymized interaction data can be linked together,” the researchers conclude.

Researchers suggest restricting large datasets to simple questions-and-answers systems or using differential privacy systems that add arbitrary substitutions that ensure data privacy,

The study was published in Nature Communications.

share Share

Big Tech Said It Was Impossible to Create an AI Based on Ethically Sourced Data. These Researchers Proved Them Wrong

A massive AI breakthrough built entirely on public domain and open-licensed data

Lawyers are already citing fake, AI-generated cases and it's becoming a problem

Just in case you're wondering how society is dealing with AI.

Leading AI models sometimes refuse to shut down when ordered

Models trained to solve problems are now learning to survive—even if we tell them not to.

AI slop is way more common than you think. Here's what we know

The odds are you've seen it too.

Scientists Invented a Way to Store Data in Plastic Molecules and It Could Someday Replace Hard Drives

What if your next hard drive wasn’t a box, but a string of molecules? Synthetic polymers promises to revolutionize data storage.

Meet Cavorite X7: An aircraft that can hover like a helicopter and fly like a plane

This unusual hybrid aircraft has sliding panels on its wings that cover hidden electric fans.

AI is quietly changing how we design our work

AI reshapes engineering, from sketches to skyscrapers, promising speed, smarts, and new creations.

Inside the Great Firewall: China’s Relentless Battle to Control the Internet

On the Chinese internet, a river crab isn’t just a crustacean. It’s code. River crab are Internet slang terms created by Chinese netizens in reference to the Internet censorship, or other kinds of censorship in mainland China. They need to do this because the Great Firewall of China censors and regulates everything that is posted […]

Anthropic's new AI model (Claude) will scheme and even blackmail to avoid getting shut down

In a fictional scenario, Claude blackmailed an engineer for having an affair.

Grok Won’t Shut Up About “White Genocide” Conspiracy Theories — Even When Asked About HBO or Other Random Things

Regardless of the context Grok, it seems, is being used to actively push a topic onto its users.