Book: The Ethics of Data and Analytics

1st Edition

Ethics of Data and Analytics
Concepts and Cases

Cover Image

N.B. Teaching notes are available for instructors.  

The ethics of data and analytics, in many ways, is no different than any endeavor to find the “right” answer. When a business chooses a supplier, funds a new product, or hires an employee, managers are making decisions with moral implications. The decisions in business, like all decisions, have a moral component in that people can benefit or be harmed, rules are followed or broken, people are treated fairly or not, and rights are enabled or diminished. However, data analytics introduces wrinkles or moral hurdles in how to think about ethics. Questions of accountability, privacy, surveillance, bias, and power stretch standard tools to examine whether a decision is good, ethical, or just. Dealing with these questions requires different frameworks to understand what is wrong and what could be better.

Ethics of Data and Analytics: Concepts and Cases does not search for a new, different answer or to ban all technology in favor of human decision-making. The text takes a more skeptical, ironic approach to current answers and concepts while identifying and having solidarity with others. Applying this to the endeavor to understand the ethics of data and analytics, the text emphasizes finding multiple ethical approaches as ways to engage with current problems to find better solutions rather than prioritizing one set of concepts or theories. The book works through cases to understand those marginalized by data analytics programs as well as those empowered by them.

Three themes run throughout the book. First, data analytics programs are value-laden in that technologies create moral consequences, reinforce or undercut ethical principles, and enable or diminish rights and dignity. This places an additional focus on the role of developers in their incorporation of values in the design of data analytics programs. Second, design is critical. In the majority of the cases examined, the purpose is to improve the design and development of data analytics programs. Third, data analytics, artificial intelligence, and machine learning are about power. The discussion of power—who has it, who gets to keep it, and who is marginalized—weaves throughout the chapters, theories, and cases. In discussing ethical frameworks, the text focuses on critical theories that question power structures and default assumptions and seek to emancipate the marginalized.

 

Introduction to Book

1 Value-laden Biases in Data Analytics...

Chapter 1: The goal of this chapter is to examine how technologies – including computer programs and data analytics – have biases or preferences.  The discussion about whether technology does things or has preferences emanates from a concern as to who is responsible for outcomes.  The arguments traditionally fall into two camps:  those that focus on the technology as the actor that ‘does’ things and is at fault (technological determinists) and those that focus on the users of that technology as determining the outcome (social determinists).  The readings chosen take a different approach by acknowledging the value-laden biases of technology – including data analytics – while preserving the ability of humans to control the design, development, and deployment of technology.  Readings included are from Langdon Winner, Batya Friedmand and Helen Nissenbaum, and Grabbrielle Johnson.  The cases include the vaccine allocation algorithm from Stanford hospital and a health care allocation algorithm.

Introduction to Chapter 1

Cases: Biases, Politics, and Technology

  • This is the Stanford vaccine algorithm that left out frontline doctors. MIT Technology Review
  • Racial bias in a medical algorithm favors white patients over sicker black patients. The Washington Post.

Concepts: Biases, Politics, and Technology

  • Langdon Winner, The Whale and the Reactor
  • Friedman, Batya, and Helen Nissenbaum. Bias in computer systems.
  • Johnson, Gabbrielle. Forthcoming. Are Algorithms Value-Free.
  • Martin, K. Algorithmic Bias and Corporate Responsibility: How companies hide behind the false veil of the technological imperative

 

2 Classical Ethical Theories and Data Analytics...

Chapter 2:  Judging whether an act is right or wrong is the domain of ethics.  The ideas behind a rules-based versus consequences-based approach to ethics are covered by many books, articles, classes, and philosophers, and these approaches are summarized in the introduction of the chapter.  The readings included in this chapter, feminist ethics of care, virtue ethics, and principles based on critical race theory, widen the lens through which we judge data analytics program to better see those who are marginalized and not easily seen or heard as well as ground our analysis in the details of design, offer ethical approaches which bring a unique voice, and therefore should be heard from the authors themselves.  These approaches should be seen as broadening the lens by which we examine data analytics programs.  In addition, data analytics programs predict and categorize people, wherein they reinforce or undermine existing categorizations or power structures such as who is allowed in and who is rejected, who is recognized and who is not.  Therefore, critical theories are a crucial tool to understand the power being distributed with data analytics programs.  Readings included are from Shannon Vallor on technomoral virtues, Carolina Villegas-Galaviz on the ethics of care and AI, and Poole et al on using critical race theory in examining AI.  The cases are on the natural language processing and PrimEyes, a facial recognition website open to anyone to use.

Introduction to Chapter 2

Cases: Ethical Misses and Data Analytics

  • GPT-3 could herald a new type of search engine. MIT Technology Review
  • How to make chat bot less sexist. MIT Technology Review
  • This facial recognition website can turn anyone into a cop — or a stalker. The Washington Post

Concepts: Ethical and Critical Approaches

  • Vallor, Shannon. Technomoral Wisdom for an Uncertain Future
  • Villegas-Galaviz, Carolina. Ethics Of Care As Moral Grounding For AI.
  • Poole et al. Operationalizing critical race theory in the marketplace

 

3 Privacy and Shared Responsibility...

Chapter 3:  How we define privacy is important to the ethics of data and analytics because both the data being analyzed and the possible categorization of subjects can have privacy expectations.  The summary gives an overview of the traditional control-version and restricted-access-version of privacy.  Both versions of privacy – the restricted access and control view of privacy – place an enormous focus on the handoff of information to others.  In other words, when information is turned over to a person or company, access to that information is no longer restricted and the individual no longer has control of that information.  For most of us, that just seems wrong.  We regularly disclose information to people and companies with strong expectations as to how it would be used or further shared.  The summary covers the privacy paradox and the idea of privacy in public.  The readings include Helen Nissenbaum on privacy as contextual integrity, Kirsten Martin on privacy as a social contract, and Clarissa Wilbur Berger  on U.S. privacy law.The related case is new case on ad tech, “Finding Consumers, No Matter Where They Hide,” and an article on Walgreens selling access to customer data.   A second reading is by Timnet Gebru et al on Datasheets on Datasets with the related case on a wrongful arrest using facial recognition.

Introduction to Chapter 3

Cases:  Privacy

  • Original Case: Finding Consumers, No Matter Where They Hide: Ad Targeting and Location Data
  • How a Company You’ve Never Heard of Sends You Letters about Your Medical Condition, Gizmodo.

Concepts:  Privacy

  • Nissenbaum, Helen. A Contextual Approach to Privacy Online
  • Martin, Kirsten. Understanding privacy online: Development of a social contract approach to privacy
  • Berger, Clarissa W. Privacy Law For Business Decision-Makers In The United States

Cases: Biased Datasets

  • Wrongfully Accused by an Algorithm (Pub. 6/24/2020). The New York Times
  • Facial Recognition Is Accurate, if You’re a White Guy. The New York Times

Concepts:  Datasets

  • Gebru, Timnit et al. “Datasheets for datasets.”

 

4 Surveillance and Power...

Chapter 4:  New forms of data collection – online and offline – make surveillance more common and even its own industry.  Surveillance can be by a single actor, such as an employer or government agency. However, surveillance is also the byproduct of the systematic collection, aggregation and use of individual data.  Companies that buy and sell consumer data create a destructive demand where their thirst for consumer data pressures consumer-facing firms to collect and sell increasing amounts of information without regard to how the collection breaches privacy expectations. Surveillance is important for the ethics of data and analytics since companies collecting, aggregating, selling, and using consumer data create a negative externality when they contribute to a larger system of surveillance. As our readings will illustrate, surveillance is all about power.  Surveillance is the persistent tracking of individuals – tracking that cannot be avoided – to control the surveilled.  The readings include David Lyon on surveillance and the panopticon and Julie Cohen on surveillance as different than privacy.  The two related cases provided are on location data aggregators and Clearview AI, a facial recognition company.

Introduction to Chapter 4

Cases:  Surveillance

  • Twelve Million Phones, One Dataset, Zero Privacy. 2019. The New York Times
  • The Secretive Company That Might End Privacy as We Know It. The New York Times

Concepts: Surveillance

  • Lyon, David. From Big Brother to Electronic Panopticon.
  • Cohen, Julie. Privacy, Visibility, Transparency, and Exposure.

 

5 Purpose of Corporation & Goals of Algorithms...

Chapter 5:  What is the goal of a corporation?  The goal of the company may seem clear – to benefit the company.  However, two different approaches have emerged in the past few decades.  One camp, based primarily on bankruptcy laws and cases, argues that when companies are being dissolved and sold for parts, those in charge (the board of directors) must make decisions that are in the interests of the company’s shareholders.  The chapter summary provides the classic reading used for the focus on shareholder wealth maximization, Milton Friedman.  However, shareholder wealth maximization (aka profit maximization) turns out to not be a helpful way to manage a company (i.e., Enron, Purdue Pharmaceutical, etc.).  The three readings included here offer explanations as to the purpose of companies and responsibilities of managers. Within the view that the purpose of the corporation is to create value for stakeholders, the criteria for projects is broader and more long-term.  Readings include R. Edward Freeman on stakeholder theory and Lynn Stout on the different legal justifications for the purpose of the company.  The related case for these readings is on the use of emotion recognition programs.  Robert Frank is included on how companies can benefit from acting responsibly and the related case is written for this book, “Recommending an Insurrection:  Facebook and Recommendation Algorithms.”

Introduction to Chapter 5

Cases:  Purpose of Corporation

  • The quiet growth of race detection software. The Wall Street Journal
  • A face-scanning algorithm increasingly decides whether you deserve the job. The Washington Post

Concepts:  Purpose of Corporation

  • Freeman, R. Edward. Managing for Stakeholders.
  • Stout, Lynn A. The problem of corporate purpose.

Case:

  • Original Case: Recommending an Insurrection: Facebook and Recommendation Algorithms.

Concepts:

  • Frank, Robert H. Can socially responsible firms survive in a competitive environment.

 

6 Fairness, Predictive Analytics, & Mistakes...

Chapter 6: “That’s not fair” is a common refrain, whether complaining about a rule being enforced at work or that a sister got a larger ice cream cone. However, explaining why an act is not fair is quite nuanced.  Definitions of justice and fairness are important for the ethics of data analytics because many times the programs are designed to allocate ‘things’ or ‘goods,’ to use the term of justice scholars.  We care how goods like admittance to college, health care, bonuses, sentences, and even ice cream, are allocated whether by the government or by a company or by our parents.  The chapter summary explains why relying on mathematically convenient definitions of fairness does not address all questions of fairness.  The readings explore three different approaches to fairness and justice in philosophy, where each has a different answer to what does it mean to be fair and just?  Readings include John Rawls and his theory of justice focused on liberty and the least fortunate, Robert Nozick with a focus on acquisition and transfer of goods, and Michael Walzer and an excerpt from his ideas on spheres of justice.  The related cases are (1) the COMPAS sentencing algorithm and (2) the use of predictive analytics in universities.

Introduction to Chapter 6

Cases:  Fairness and Justice

  • Machine Bias (COMPAS Algorithm). ProPublica
  • Bias in Criminal Risk Scores Is Mathematically Inevitable, Researchers Say. ProPublica.
  • Major Universities Are Using Race as a “High Impact Predictor” of Student Success. The Markup.

Concepts:  Fairness and Justice

  • Rawls, John. A Theory of Justice. Harvard University Press
  • Nozick, Robert.  Anarchy, State, and Utopia. Basic Books.
  • Walzer, Michael. Complex Equality.  Spheres of Justice.

 

7 Discrimination...

Chapter 7: Data analytics programs are frequently used in decisions governed by the concepts of disparate treatment and disparate impact. Disparate treatment is the U.S. legal term for differentially treating a protected class (e.g., gender, race, ethnicity, religion, national origin, etc) and requires proof of not only the disparate treatment but also the intent to treat a class of individuals differently based on their status.  And a number of relevant laws govern decisions made by humans (and those augmented by data analytics) include the concepts of disparate treatment and disparate impact as measurements for discrimination:  Fair Housing Act of 1968, Americans with Disabilities Act, the Age Discrimination in Employment Act, Equal Credit Opportunity Act, and the Civil Rights Act of 1964.  Readings include Solon Barocas and Andrew Selbst on the many ways data analytics, specifically data mining, can discriminate in the design of a program and Anna Lauren Hoffman on the limitations of only discussing discrimination in regards to algorithms. The two related cases are (1) on Amazon’s hiring AI program and (2) a predictive program used by banks.

Introduction to Chapter 7

Cases: Discrimination

  • Amazon scraps secret AI recruiting tool that showed bias against women Reuters
  • Bias isn’t the only problem with credit scores—and no, AI can’t help. MIT Technology Review.

Concepts:  Discrimination

  • Barocas, Solon and Selbst Andrew D. 2016. “Big Data’s Disparate Impact.”
  • Hoffmann, Anna Lauren. “Where fairness fails: data, algorithms, and the limits of antidiscrimination discourse.”

 

8 Creating Outcomes and Measuring Accuracy...

Chapter 8:  Chapter 8 turns to identify the ethical issues in accuracy including choosing the outcome variable, measuring accuracy, and the problem of creating accuracy.  (1) The creation of the outcome variable chosen has implications as to what the organization thinks is important and whose interests are prioritized in the design of the data analytics program. (2) Accuracy is often used as the measure of a working data analytics program, where accuracy is how well the program predicts or categorizes people.   Accuracy for the majority (even if meaningful) does not mean a program is accurate for all groups.  (3) We face the challenge in prediction data analytics in creating accuracy, where individuals with a particular outcome variable are treated differently than those with a different predictive score – making measuring accuracy very messy. Predictive analytics runs into the possible problem of creating accuracy in categorizing someone with an outcome variable (promotable, hirable, trustworthy, high likelihood of recidivism, etc) which pushes the individual into a course of treatment that then creates the outcome predicted by the program.  The readings include Rachel Thomas and David Uminsky on issues with measuring outcomes and Kirsten Martin on designing algorithms to account for the inevitable mistakes.  The related case is a predictive analytics program labeling students as future criminals.

Introduction to Chapter 8

Cases:  Measuring Accuracy

  • Pasco’s sheriff uses grades and abuse histories to label schoolchildren potential criminals. Tampa Bay Times.

Concepts:  Measuring Accuracy

  • Thomas, Rachel, and David Uminsky. The Problem with Metrics is a Fundamental Problem for AI.
  • Martin, Kirsten. Designing Ethical Algorithms

 

9 Gamification, Manipulation, and Analytics...

Chapter 9: gamification is a part of a suite of data analytics tactics designed to influence decision making, which includes the use of dark patterns, manipulative advertising, and deepfakes.  All four seek to influence an individual – their beliefs, their behaviors, their decisions – in a manner that is not obvious to the target.  When employed in their best possible use, these tactics act for the betterment of the individual (the target) and society.  However, when employed in alternative uses, these data analytics tactics can be exploitive and undermine individuals’ decision making.    Readings include Tae Wan Kim and Kevin Werbach with a framework to address major ethical considerations associated with gamification, Kirsten Martin on manipulation and the collection of intimate data about individuals, and Vikram Bhargava and Manuel Velasquez on social media and Internet addiction.  Two related cases are on deepfake advertising and Uber’s use of gamification.

Introduction to Chapter 9

Cases:  Gamification and Manipulation

  • How Uber Uses Psychological Tricks to Push Its Drivers’ Buttons” The New York Times.
  • How Deepfakes could change fashion advertising. Vogue Business.

Concepts:  Gamification and Manipulation

  • Kim, Tae Wan and Kevin Werbach. Ethics of Gamification.
  • Bhargava, Vikram R., and Manuel Velasquez. Ethics of the attention economy: The problem of social media addiction.
  • Martin, K. Manipulation, Privacy, and Choice.

 

10 Accountability for AI...

Chapter 10:  Transparency, for data analytics, means providing enough information so that others can understand the performance of the program. Here we are going to explore transparency in service of an explanation, for accountability, and for contestability. . First, transparency may be in service of explaining the data analytics program.  This idea is usually countered with a claim that the program is difficult if not impossible to explain – but we also know that’s not exactly correct.  Second, transparency is needed in service of accountability to understand the role of a human to be responsible for the outcomes. This brings us to the third reason we request transparency –to ask questions.  And our readings cover the idea of contestability as an ethical design principle.   In the readings, Karen Hao explores how humans are inserted in technological systems for accountability and Deirdre K Mulligan, Daniel Kluttz, and Nitin Kohli offer ‘contestability’ as the ultimate goal of the many discussions around transparency. The related cases included are (1) the case of Houston teachers rated by an algorithm and (2) a cheating detection program used on students taking tests online.

Introduction to Chapter 10

Cases:  Transparency and Accountability of Algorithmic Decision Making

  • Cheating-detection companies made millions during the pandemic. Now students are fighting back. The Washington Post.
  • Houston teachers to pursue lawsuit over secret evaluation system. Houston Chronicle

Concepts:  Transparency and Accountability of Algorithmic Decision Making

  • Hao, Karen.   When algorithms mess up, the nearest human gets the blame.
  • Mulligan, Deirdre K., et al  Shaping Our Tools: Contestability as a Means to Promote Responsible Algorithmic Decision Making in the Professions

 

11 Ethics, AI Research, and Corporations...

Chapter 11:  Pushing the ethical evaluation of the design and development of AI and data analytics generally to outsiders has implications as to how corporations critically evaluate their technology:  who can ask questions, what questions can be asked, and how any critical, ethical evaluation is performed.  And these are the types of issues being debated in corporations and in academic research right now around computer science and data analytics. Whether corporations should be responsible to critically evaluate their own technology is not established and some in computer science debate whether researchers should have to include ethics statements in their work.  Corporations have pushed back against being responsible for the moral implications of their data analytics programs by limiting the type of research conducted in the organization or by outside researchers. We tackle this hard-to-square position – to not critically examine their own work but also not allow others access to examine their products – through readings and a particular case on Google Research and research on AI ethics and readings by Richard Rudner and Kirsten Martin.

Introduction to Chapter 11

Case:

  • Original Case: Google Research: Who is Responsible for Ethics of AI?

Concepts:

  • Martin, Kirsten. Ethical implications and accountability of algorithms.
  • Rudner, Richard. The scientist qua scientist makes value judgments