Does the rise of big data mean the downfall of privacy? Mobile technologies now allow companies to map our every physical move, while our online activity is tracked click by click. Throughout 2014, BuzzFeed’s quizzes convinced millions of users to divulge seemingly private responses to a host of deeply personal questions. Although BuzzFeed claimed to mine only the larger trends of aggregate data, identifiable, personalized information could still be passed on to data brokers for a profit.
But the big data revolution also benefits individuals who give up some of their privacy. In January of this year, President Obama formed a Big Data and Privacy Working Group that decided big data was saving lives and saving taxpayer dollars, while also recommending new policies to govern big data practices. How much privacy do we really need? In advance of the Zócalo event “Does Corporate America Know Too Much About You?”, we asked experts the following question: How can we best balance the corporate desire for big data and the need for individual privacy?
Last week, the government of Singapore announced an increase in the cost of a toll at Bangunan Sultan Iskandar, the customs point for travelers entering and exiting between Singapore and Malaysia. Motorists, who will have to pay over five times more than they previous paid, are furious. In protest, a group of hackers, known simply as “The Knowns,” have decided to use their skills to hack into and release corporate data on customers. The group released the mobile numbers, identification, and addresses of more than 317,000 customers of Singapore-based karaoke company K Box.
In an era of “hacktivism,” data is necessarily vulnerable. So how do we negotiate between companies’ increasing needs to collect and store our personal digital data, individuals’ privacy and ethical needs, and governments that are often slow to gain an understanding of these needs and how to address changes in this area?
If we borrow from recent work by psychologists and ethicists, we can agree upon a few preliminary guidelines: 1) Before collecting private and personal data, consumers should be informed of what data a company intends to collect, how it will be stored and used, and what precautions are being made to protect their information from data attacks. 2) Consumers should be given the ability to consent and opt-out from collection of personal data. 3) Companies that are collecting and storing personal data should periodically remind their customers about their data storing policies.
Although companies should have the freedom to be innovative in their business models (such as by collecting new types of consumer data), these methods should not compromise the individuals on whom companies ultimately depend.
A big data society seems to be inevitable, and promises much, but privacy (properly understood) must be an important part of any such society. To have both privacy and the benefits of big data, we need to keep four principles in mind:
First, we need to think broadly about privacy as more than just the keeping of a secret, but as the rules that must govern personal information. Privacy rules are information rules. We have rules now protecting trade secrets, financial and medical data, library records, and computer security. We have to accept the inevitability that more rules (legal, social, and technological) will be needed to govern the creation of large data sets and the use of big data analytics.
Second, we need to realize that information does not lose legal protection just because it is held by another person. Most information has always existed in intermediate states. If I tell you (or my lawyer) a secret, it is still a secret; in fact, that’s the definition of a secret, or as we lawyers call it, a confidence. We must ensure that big data sets are held confidentially and in trust for the benefit of the people whose data is contained in them. Confidentiality rules will be essential in any big data future.
Third, we need to realize that big data isn’t magic, and it will not inevitably make our society better. We must insist that any solutions to social problems based on big data actually work. We must also insist that they will produce outputs and outcomes that support human values like privacy, freedom of speech, our right to define our own identities, and political, social, economic, and other forms of equality. In other words, we need to develop some big data ethics as a society.
Finally, it’s important to recognize that privacy and big data aren’t always in tension. Judicious privacy rules can promote social trust and make big data predictions better and fairer for all.
When asking “how can we best balance” the desires of corporations and the needs of individuals, we need to recognize that there are different “we”s involved here. Executives at Google and Facebook are interested in learning from big data, but they are, naturally, more concerned about their own individual privacy than the privacy of their users.
As a political scientist, I’m interested in what I can learn from moderately sized data such as opinion polls and big data such as voter files. And I naively act as if privacy is not a concern, since I’m not personally snooping through anyone’s particular data.
Survey organizations also profit from individuals’ data: They typically do not pay respondents, but rather rely on people’s goodwill and public-spiritedness to motivate them to participate voluntarily in helping researchers and answering surveys. In that sense, the issue of privacy is just part of the traditional one-way approach to research in which researchers, corporate and otherwise, profit from uncompensated contributions of the public. It is not clear how to balance this unequal exchange.
Andrew Gelman is a professor of statistics and political science at Columbia University. His books include Bayesian Data Analysis and Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do. He blogs at http://andrewgelman.com/.