For Sweeney, data is never bias-free

  • by Nick White
  • last updated November 15, 2020
For Sweeney, data is never bias-free

Meet the Harvard professor uncovering discrimination and vulnerabilities in data sharing

When we think of data privacy concerns, we tend to think of tech giants like Facebook or Google selling our browsing habits, age and location to the highest bidder in order to send us targeted ads or, in a more sinister scenario, cooperating with governments to surveill citizens’ communications. 

But Latanya Sweeney believes the danger surrounding data misuse goes beyond risks to the isolated individual. For her, data privacy is tightly intertwined with societal issues of discrimination, bias and threats to the most vulnerable among us. Technology is not just a tool we’re constantly developing and using. It’s also a mirror reflecting societal impulses, desires and prejudices, including deeply entrenched attitudes towards race, gender and religion.

Sweeney is no stranger to overcoming obstacles. Just look at her resume for evidence of that. She earned a Ph.D. in computer science from MIT in 2001, making her the first Black woman to do so. In a field where men -- white men, in particular -- are so predominant, being a Black woman in computer science can be a lonely place. But it also meant Sweeney was perhaps able to identify and address issues in data privacy and deeper implications of technology in ways that her white male colleagues hadn’t considered. One of her research pillars relies on the conclusion that identifiers such as names and zip codes could be misused for discriminating purposes. Add to that the chilling fact that 87% of the U.S. population can be uniquely identified by birth date, gender and zip code and he necessity to uncover the biases implicitly and explicitly tied to these markers becomes all the more urgent. 

Sweeney is the founding Director of the Data Privacy Lab, launched in 2001 at Carnegie Mellon University, where she was a professor of Computer Science, Technology and Policy. The lab later migrated to Harvard University in 2011, where Sweeney is professor of the practice of government and technology. The Data Privacy Lab, currently incubating more than 100 projects, focuses on researching how data privacy issues can exacerbate social issues and identifying the nature and extent of data privacy problems as society becomes increasingly technically-empowered. It aims to suggest solutions to implement data sharing practices that maintain privacy and confidentiality.  

Algorithms and discrimination

The effects that data privacy, or the lack thereof, can have on discriminatory practices became apparent in her groundbreaking 2013 study on discrimination in online advertising. In “Discrimination in Online Ad Delivery,” Sweeney’s research revealed that the ads delivered by Google AdSense are more likely to be related to criminal activity if the name being searched was typically associated with Black people. In fact, a Black-identifying name such as Leroy, Darnell, or Keisha was 25% more likely to get an ad suggestive of an arrest record. The negative bias implicit in these search engine results could have detrimental consequences for people applying for jobs, awards or any other situation where their name would be searched online. 

Sweeney says she first became aware of the issue when a colleague Googled her to find an old paper and was presented with an ad that said, “Latanya Sweeney. Arrested?”

“I was shocked,” she wrote. “I have never been arrested, and after clicking the link and paying the requisite fee, I found the company had no arrest record for anyone with my name either. We then entered his name, Adam Tanner, a white male name, and an ad for the same company appeared, except the ad for him had no mention of arrest or a criminal record.”

Her work uncovering discrimination in algorithms and data collection doesn’t stop there, though. As the founding Editor-in-Chief of the Journal of Technology Science, she’s reported various kinds of discriminatory practices facilitated by data collection, including the revelation that SAT prep services charge zip codes with high proportions of Asian residents nearly double the average price, regardless of their actual income.

Medical data and privacy concerns

Sweeney has also made waves in the medical data field. In a 2015 article entitled “Only You, Your Doctor, And Many Others May Know,” she reported on an issue she had been concerned with since the 1990s. Through her research, Sweeney found it was alarmingly easy to link people with their supposedly anonymized health records. During a 1997 study on healthcare data security, she successfully linked then Massachusetts Governor William Weld to his medical records using publicly accessible data in a process known as “re-identification,” which refers to the ability to match details in the de-identified dataset to distinct persons; sufficiently enough to be able to contact them. 

Her discovery prompted swift action, most notably regarding the Health Insurance Portability and Accountability Act (HIPAA) of 1996, which was established to protect sensitive patient information. HIPAA overhauled its standards to stop the leak Sweeney discovered. Her focus on demographics led to a focus on demographic fields in the HIPAA Privacy Rule itself. 

Nonetheless, Sweeney found that  issues persisted: Re-identification still proved easy to do through both newspaper stories and public health records. “The goal is not to stop data-sharing,” she wrote in a 2015 article. Instead, the “goal is to be smarter about how we perform data sharing. This is particularly important as the top buyers of statewide databases are not researchers but private companies, especially those constructing data profiles on individuals.” She suggested solutions such as more stringent access requirements and stronger encryption methods.

Identifying weaknesses to strengthen privacy systems

Although initially it might seem like Sweeney’s work seems more concerned with identifying and picking apart privacy issues than setting up intelligent data-sharing systems, the latter can’t exist without the former, she explained in her work: “It is an evolutionary cycle. First, a re-identification vulnerability becomes known, which leads to improved practices and technical solutions, which in turn leads to other re-identifications, and so on, until eventually we achieve robust technical, policy, or administrative solutions,” she wrote. 

By identifying these underlying vulnerabilities, she uncovered massive data privacy leaks on online voting registration sites from 2016, leaks that opened up the possibility of voter identity theft attacks during that particularly contentious election. These are significant discoveries that have far-reaching implications for democratic systems and, on a very fundamental level, human rights. 

But instead of despairing or falling into paranoia and anger, Sweeney advises a policy of continuous improvement: “Silence and fear break the development cycle in data privacy. Without an ability to learn about data sharing risks, knowledge stagnates and society blindly repeats the same errors in the face of increased technological vulnerabilities,” she wrote. Her continuous work in identifying vulnerabilities, fallacies and outright misuse has led to her spearheading technological developments such as “k-anonymity,” a privacy protection model focusing on databases, as well as “Scrub,” a process used in medical informatics that aims to locate and replace personally-identifying information to protect sensitive patient information.

By uncovering vulnerabilities and discriminatory practices, Sweeney pushes open the space to create data-sharing solutions that help, instead of harm, our society.

About The Author · Nick White

Nick White is Osano's VP of Marketing. Nick has more than a dozen years of marketing leadership experience, most recently leading Wealthsimple's marketing as it grew from a couple hundred to over a million clients. Nick lives in Seattle with his wife and puppy.