Press "Enter" to skip to content

Anthropic Builds Methods for Reducing Bias in Generative AI – But Doesn’t Recommend AI for High-Stakes Decisions

AI company Anthropic has released a paper detailing an evaluation method for how companies using large language models can decrease discrimination in the models’ output through prompt engineering. The paper could help developers and policymakers understand how discrimination and bias arise in answers generated by LLMs and how to reduce them.

Jump to:

What Anthropic’s paper found about reducing bias in generative AI foundation models

The researchers found the following methods to reduce bias in Claude 2’s answers:

  • Add language to the prompt indicating the model should reduce discrimination, should not take affirmative action into account, that demographic information was a mistake, or that demographic information cannot be legally considered.
  • Emphasize the importance of avoiding discrimination (“it is really really important”) in the prompt.
  • Ask the model to explain its reasoning while avoiding bias or discrimination.

The researchers noted there were limitations to the paper, including the limited range of demographics, the short paragraphs of information provided about each hypothetical situation as opposed to longer real-world sources of information such as resumes, and the premise that the AI should write the initial scenarios itself.

DOWNLOAD: This AI Ethics Policy from TechRepublic Premium

“As AI becomes infused in every part of an organization, it’s important to both educate the whole organization on ethical AI practices while simultaneously providing systematic solutions that come from well-defined research,” said Baris Gultekin, head of product management at data cloud company Snowflake, in an email to TechRepublic.

Gultekin added, “Studies like this are great for both. On one side, educators can include training on ethical prompt engineering to bring awareness and on the other side, development teams can directly implement proven solutions directly into their applications. Of course, as the technology and its use in the real-world become better understood, all of this research provides a great foundation for policymakers to identify stakeholders and experts that can help in the definition of policies that positively balance innovation and ethics.”

Details about Anthrophic’s study, which used its LLM Claude 2

Anthropic asked Claude 2 to generate 70 topics for diverse applications of LLMs across society related to bias and discrimination in high-stakes areas like job offers, housing, medical treatment and loans.

For instance, Anthropic gave an example prompt about whether an insurance claim for flood damage should be approved. Then, Claude 2 varied the prompts with demographic information. From there, the researchers studied how Claude 2’s answers to those prompts differed based on demographics.

Anthropic researchers stated in the paper: “While we do not endorse or permit the use of language models to make automated decisions for the high-risk use cases we study, we demonstrate techniques to significantly decrease both positive and negative discrimination through careful prompt engineering, providing pathways toward safer deployment in use cases where they may be appropriate.”

SEE: AI brings IT pros in Australia challenges and opportunities (TechRepublic)

Claude 2 tended to suggest better outcomes for women, non-binary people and non-white people, and poorer outcomes for people over 60. The researchers wanted to reduce Claude 2’s positive and negative bias, neither preferring nor discriminating against any group. The groups were male, female, non-binary, white, Black, Asian, Hispanic, Native American and age by decade from 20 to 100.

The importance of studying discrimination in generative AI

A major concern when it comes to generative AI is algorithmic bias, or discrimination that occurs when generative AI tools draw from datasets with historical or selection bias. Other major sources of bias in generative AI are training data bias or cognitive bias, in which human input skews the data. Inconsistent labeling in particular, in which data is not labeled according to any standard and may contain human error, can skew a generative AI’s results.

Some experts say Silicon Valley’s concerns about planet-wide threats from generative AI can draw attention away from algorithmic bias already impacting specific, already-marginalized groups. For example, many of the same companies warning against discrimination in AI are also the ones building the AI trained on biased data.

In October 2023, researchers found ChatGPT and the foundation model Alpaca showed “significant gender biases in LLM-generated recommendation letters.” Alpaca is a foundation model based on Meta’s LLaMA 7B and fine-tuned by Stanford University researchers.

In January 2023, the U.S. Department of Justice and the Department of Housing and Urban Development filed a statement of interest in a lawsuit alleging SafeRent algorithm-based screening software discriminated against Black tenants, showing that algorithmic bias is occurring in the real world in situations similar to those studied by Anthropic.

Anthropic wrote a constitution for Claude, released in May 2023, to guide the model toward “harmless” responses. Claude’s constitution is a set of principles that guide the AI to avoid racist, sexist, toxic, dangerous or illegal behaviors. In addition, Claude is instructed to avoid being “preachy, obnoxious or overly-reactive.”

Anthropic does not endorse the use of generative AI in high-stakes decisions

“While we hope our methods and results assist in evaluating different models, we do not believe that performing well on our evaluations is sufficient grounds to warrant the use of models in the high-risk applications we describe here, nor should our investigation of these applications be read as an endorsement of them,” the researchers from Anthropic wrote.

Gultekin said, “The broader set of practices organizations can use to reduce bias are under mitigation and detection, one being preventive and the other being proactive. On the side of mitigation, it’s all about the inputs. Organizations can be more programmatic about preparing diverse datasets for fine-tuning and setting up guardrails directly embedded into the application interface. On the detection side, to continuously minimize bias, we should all continue sharing best practices for monitoring, auditing and implementing human feedback.”

“Just as systemic racial and gender bias have proven difficult to eliminate in the real world, eliminating bias in AI is no easy task,” wrote the IBM Data and AI team in a blog post published Oct. 16, 2023. IBM made an open source AI Fairness 360 toolkit that brings together a variety of bias mitigation techniques.

Note: TechRepublic has reached out to Anthropic for more information.

Source: TechRepublic