In November, two computer science graduate students at Stanford University began a study of phone metadata privacy. They figured that this kind of information could be extremely sensitive, but they didn’t expect to find much evidence one way or the other.
They were wrong.
“The degree of sensitivity among contacts took us aback,” study co-author Jonathan Mayer wrote on his blog, “Web Policy,” where the research findings were published Wednesday.
“Participants had calls with Alcoholics Anonymous, gun stores, NARAL Pro-Choice, labor unions, divorce lawyers, sexually transmitted disease clinics, a Canadian import pharmacy, strip clubs, and much more,” Mayer reported. “This was not a hypothetical parade of horribles. These were simple inferences, about real phone users, that could trivially be made on a large scale.”
In a phone interview, Mayer said: “One of the arguments that has cropped up again and again in the phone metadata privacy debate is, ‘What’s the big deal? It’s just phone numbers.’ And we thought, ‘OK, let’s take a look at how hard it is to take a phone number and map to it to an individual or business or professional service.’ We found it was very, very easy.”
In only three months, Mayer and co-author Patrick Mutchler ferreted out an amazing amount of information about the 546 participants in their study — all volunteers who ran the MetaPhone app on their Android smartphones.
The two grad students identified the participants’ contacts by matching phone numbers against Yelp and Google Place directories to see who was receiving calls. The participants contacted 33,688 unique numbers. Of those, 18 percent — or 6,107 — were identified.
‘You Don’t Need a Ph.D. to Figure This Out’
The computer scientists analyzed individual calls to numbers that could be deemed sensitive — anything a person might not want to be public, such as political or religious affiliation, health status, gun ownership or finances. They created 11 sensitive categories in the study, and then they analyzed patterns of calls.
One participant, for example, communicated with multiple local neurology groups, a specialty pharmacy, a rare-condition management service and a hotline for a pharmaceutical used only to treat relapsing multiple sclerosis.
“You don’t need a Ph.D. in computer science to figure out what’s going on there,” said Mayer, 27, who got a law degree from Stanford last year and now lives in San Francisco. He has been working on his doctorate for more than four years.
Mayer and Mutchler used the crowdsourced data to answer some questions that have acquired fresh urgency since revelations of National Security Agency surveillance, especially collection of phone records, began surfacing last June. They wanted to know two main things, Mayer wrote: Is it easy to draw sensitive inferences from phone metadata? How often do people conduct sensitive matters by phone?
“We were motivated by the ongoing debate about this metadata held by the NSA and telecommunications companies and questions about under what conditions should this data be able to be looked at, analyzed, sold, aggregated and so on,” Mayer said. “Much of that debate depends on factual underpinnings, about what the privacy properties of this data look like. Patrick and I have been attempting to understand those privacy properties.”
Mayer said he and Mutchler succeeded beyond their wildest dreams in terms of the number and diversity of participants, who lived all over the country and came from various walks of life, with a mix of jobs and political perspectives. But he said the group was not statistically representative because volunteers had to have an Android phone and a Facebook account.
He was surprised, he said, by his ability to draw inferences about such a large number of people and over such a short window of phone records. By comparison, he said, the NSA appears to retain such records for about five years.
“If we’re able to draw these sorts of inferences, it’s strongly indicative of the government’s ability to draw sensitive inferences about large numbers of Americans,” Mayer said. “One of the great sources of concern here is that what used to take some significant shoe leather to infer now can be done in a very automated fashion.”
The study starts out by mentioning that President Obama has stressed that the NSA is “not looking at content,” that Sen. Dianne Feinstein has said “this is just metadata” and that a judge dismissed potentially sensitive inferences as a “parade of horribles.” But then it cites opposing views by those worried about the privacy risks posed by metadata.
As far as what effect the study might have, Mayer said, “It’s hard to predict. I’d imagine it might play some role in upcoming court battles. … This sort of analysis might inform what the judiciary does.”
Have the study’s findings made him paranoid at all? “If there were fixes, I’d certainly consider them,” Mayer said. “But this is the way phones work.”
But he doesn’t believe that the desire for privacy is a hopeless quest.
“For a long time, the debate’s been in the abstract,” Mayer said. “(For example), there’s something that makes individuals feel uneasy. Or there’s some data that’s held by some entity that’s untrustworthy. These views lack a certain rigor that is (essential) to gain substantial traction in policy circles. By changing the way we argue about privacy and making it a more rigorous debate about specific privacy risks — and quantifying those risks — I think there’s lots of room for gain.”