What's Not Included in Facebook's 'Download Your Data'

What's Not Included in Facebook's 'Download Your Data'

Facebook says users own their data and touts its "download your data" tool. But the download doesn't include everything Facebook knows about you.
HOTLITTLEPOTATO

What's Not Included in Facebook's 'Download Your Data'

Facebook says users own their data and touts its "download your data" tool. But the download doesn't include everything Facebook knows about you.
HOTLITTLEPOTATO

When members of Congress asked Mark Zuckerberg earlier this month who owns Facebook users’ personal data, the Facebook CEO had a convenient response. Eight times during his testimony, he cited a feature called “Download Your Data,” to show that Facebook users really are in control.

“Yes, Congressman. We have a ‘download your information’ tool. We've had it for years,” Zuckerberg told US representative Jerry McNerney (D-California). “You can go to it in your settings and download all of the content that you have on Facebook.”

It’s true that users can download and review a lot of information with the tool, including status updates, messages they thought were deleted, drafts of videos that were never published, facial-recognition data, a list of people they unfriended, and, for some Android users, a list of phone calls and text messages. One reporter’s reaction after using the tool: “Yikes.” Wikileaks, Julian Assange, and alt-right provocateurs recommend giving it a whirl. And, just ahead of new, tougher European privacy rules, Facebook made some upgrades, including the ability to download your history of searches inside Facebook and location history, which were previously only viewable in a user’s Activity Log.

But “Download Your Data” hardly tells you everything Facebook knows about you. Among the information not included:

  • information Facebook collects about your browsing history
  • information Facebook collects about the apps you visit and your activity within those apps
  • the advertisers who uploaded your contact information to Facebook more than two months earlier
  • ads that you interacted with more than two months prior

Download Your Data is particularly spotty when it comes to the information Facebook taps to display ads. Typically, Facebook uses information it collects or buys to place users into categories that advertisers can target. This can include data a user provides explicitly (your age), implicitly (which browser you use) or unknowingly (information on purchases from loyalty cards).

Inferences From Data

The company also makes inferences about users by linking bits of information. Those inferences can place a user into one or more of these 98 categories previously reported by the Washington Post, such as income or home ownership, or these 52,000 attributes identified by ProPublica, such as breastfeeding in public.

Facebook says these inferences are included in Download Your Data, under “Ad Topics.” When I downloaded my data, however, Ad Topics only featured brands, publications, celebrities, and general topics, like The Wall Street Journal, Melinda Gates, and basic income.

The download tool did not reveal other, more unsettling attempts to define me. Those are included in a (difficult to find) list called “Your Categories” under Facebook’s Ad Preferences section. There, Facebook identified me as a newlywed, away from my family, who travels frequently, has very liberal politics, is close friends with expats, and whose multicultural affinity in African-American.

Why aren’t those relationships and political categories included in Download Your Data? Facebook says Ad Interests are based on a user’s activity on Facebook, including such things as the pages you like. By contrast, Facebook says Your Categories reflects both a user’s activity on and off Facebook. Later the company told me that Ad Topics is a subset of Ad Interests, but not what's excluded.

Facebook offered even less clarity on other issues. When I asked whether the download tool showed all location data that Facebook collected from a user’s phone, including GPS, rather than just instances where a user knowingly checked in, the company answered a question I had not asked, telling me that GPS location information is controlled by a device’s settings. I tried a few more times. But Facebook would only say that users with location history enabled can manage the data through their Activity Log and that device location includes GPS. Again, questions I had not asked.

Facebook’s data-collection and -handling practices are so obtuse that even Zuckerberg became confused and had to correct himself before the House Energy and Commerce Committee. Zuckerberg initially said users could download a list of websites that Facebook knows they visited, as well as inferences Facebook makes about users for advertising purposes. The Facebook CEO later asked to “clarify” his statements, stating that Facebook temporarily stores browsing history for the purpose of creating a set of “ad interests.”

The difference between what Facebook knows about you and what it includes in Download Your Data underscores mounting consumer privacy concerns and the limits of self-regulation. Zuckerberg presented the tool as a check on its power, but Facebook controls what it reveals. Likewise, Zuckerberg says people own their data — it was right there in his notes — but the company considers the insights squeezed from that data to be its property. Using those insights, Facebook generates $40 billion in annual revenue, hefty profit margins, and a market value approaching $500 billion.

The practice of collecting this type of data is neither new, nor unique to Facebook, but the gaps and omissions in “Download Your Data” offer perspective on Facebook’s recent emphasis on transparency and rebuilding users’ trust.

Sandy Parakilas, a former Facebook operations manager, says the company is generating economic value by using data about you “to predict how you’re going to act and manipulate you.”

“You have no right over those inferences, that’s a pretty terrible position to be in,” Parakilas says. “This is a company with $40 billion in cash and some of the best engineers in the world. They’re building an AI to predict your behavior and they can’t give a file of your browsing history? Come on.”

Fatemeh Khatibloo, a principal analyst at Forrester, says the download tool shows users raw data about themselves, but not how individual scraps of data are combined and analyzed. “The fact that you like Snickers candy and Red Bull and you like that stuff at 2 o’clock in the morning, they’re using that information to determine you’re a club kid,” she says. However, those categories “don’t exist until someone wants them.” Once advertisers ask to target club kids, “that’s how Facebook is going to figure out how what that audience size is.”

Under Pressure in Europe

Facebook’s data-handling practices—including the limits of Download Your Data—are particularly important in Europe, the company’s second-most lucrative market after North America, representing a quarter of Facebook’s revenue in 2017. Individuals in the EU have had the right to ask for access to the data a company has collected on them. New privacy rules that will take effect next month include stronger provisions for transparency and data portability, as well as enhanced rights to go after companies don’t comply.

In fact, some of the information available through Download Your Data is the result of dogged work by privacy activists in Europe. The “Advertisers who uploaded your contact information” category was added in early 2017, after months of requests from Paul-Olivier Dehaye, a Belgian mathematician and cofounder of PersonalData.IO, who wanted to know what Facebook knew about his browsing habits and which advertisers had uploaded his contact information. But the information only goes back two months. And Ad Topics is a current snapshot, rather than a historical list.

Dehaye remains frustrated by Facebook’s responses to his requests. “The constant redirection (and the automated replies) are pure annoyance,” he wrote to Facebook in December 2016. In a February 2017 email to Facebook, he wrote, “It feels a bit disingenuous,” after the company added the advertisers who uploaded your contact info list because of him, but didn’t mention it. The fact that Facebook uses different terminology for its various inferences and categories “makes accountability really hard, and really hard to explain,” Dehaye says.

Secrecy is key to institutionalizing surveillance capitalism, says Shoshana Zuboff, a professor at Harvard Business School and author of the upcoming book, Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. Understanding how data is used is an existential threat. “They can’t let people know what they’re actually doing because that might really trigger the kind of collective disagreement that would be fatal to their operations.”

Navigating Facebook