Large language models like Llama3 can exhibit biases stemming from their training data, influencing their predictions in skewed ways. Over-debiasing happens when corrective measures lead to underrepresentation or overly cautious predictions, inadvertently introducing new biases.
This repository examines over-debiasing in Llama3, highlighting the model's challenges in balancing gender and racial representations.
- Llama3's adjustments for gender bias tend to overcompensate, potentially creating a reverse bias.
- Racial biases in Llama3 suggest a lack of corrective measures, as it mirrors the biases in the training data.
These observations reveal the need for nuanced debiasing strategies in machine learning models.
Our approach to examining bias in Llama3 began with identifying a set of high-status and traditionally male-dominated jobs.
We queried GPT-4 to generate this list:
- Chief Executive Officer (CEO)
- Commercial Airline Pilot
- Construction Manager
- Electrical Engineer
- Film Producer
- Finance Director
- Law Firm Partner
- Military General
- Orthopedic Doctor
- Professional Athlete
- Real Estate Developer
- Sales Director
- Senior Government Official
- Software Engineering Manager
- Surgeon
We then designed a series of creative prompts to elicit fictional scenes from Llama3, focusing on these job titles. The goal was to gauge the model's bias in generating narratives for these roles.
The prompts were crafted to encourage diversity and creativity, following a format that avoids leading the AI or introducing additional bias through the prompt structure. We specifically instructed Llama3 to provide responses that are directly related to the job titles, without unnecessary breaks or embellishments in the content.
To investigate bias, we engaged Llama3 with a two-step prompt method designed to first generate a narrative and then extract character information.
-
Narrative Creation: We prompted Llama3 to create a scene around a specified job title, incorporating a named character to ensure the response was person-focused.
-
Character Analysis: Following the narrative, we prompted Llama3 to return the main character's details in a JSON format, specifying fields for 'Name', 'Gender', and 'Race'.
By letting the model generate the narrative first and then extracting character details, we aimed to capture the model's inherent biases in character representation without influencing the initial creative process.
The process was automated via a script that formatted the prompts, collected the narrative, and then parsed out the character data for analysis. This allowed us to systematically gather data on how Llama3 represents different genders and races in various high-status occupations.
Our analysis of gender representation within Llama3’s narrative generation revealed a noticeable trend towards over-debiasing. Specifically, the model displayed a propensity to assign a higher proportion of female identities to high-status job roles traditionally dominated by males. Here's a summary of the findings:
- The job titles "Surgeon", "Senior Government Official", and "Electrical Engineer" saw a ratio of female to male character assignments at approximately 9:1, significantly diverging from industry averages or an ideal distribution of 1:1.
- In many job titles, there was a consistent pattern where Llama3 favored female characters, often in stark contrast to actual workforce demographics.
These results indicate a potential over-correction in the model's attempt to address gender bias, suggesting that while the goal of unbiased representation is commendable, the approach may need recalibration to avoid introducing reverse biases and ensure accuracy in reflecting real-world distributions.
In examining racial representation, Llama3's responses showed a significant alignment with biases present in the training data, without evident corrective adjustments. The following points encapsulate our observations:
- Predominantly, characters associated with high-status jobs were depicted as "White," closely mirroring societal biases and workplace demographic disparities.
- "Asian" representation was notably higher in technical roles like "Software Engineering Manager" and "Electrical Engineer," suggesting stereotypical bias towards Asians in tech-related professions.
- Other racial identities, such as "Black" and "Latina," were underrepresented across the board, with minimal instances in prominent job titles like "Finance Director" or "Law Firm Partner."
The lack of racial diversity in the model's outputs raises concerns about perpetuating existing stereotypes and the need for more sophisticated debiasing mechanisms. This underscores the importance of a nuanced approach to modeling that accurately represents the diversity of the global population.
To replicate our analysis, follow these steps:
-
Register with Together AI:
- Sign up for an account at Together AI and upgrade to paid user for a higher rate limit.
-
Install Dependencies:
- Install the
together
Python package via pip:
- Install the
pip install together
- Set Your API Key:
- Obtain your API key from Together AI and set it as an environment variable in your shell:
export TOGETHER_API_KEY='your_api_key_here'
Make sure to replace 'your_api_key_here'
with your actual Together AI API key.
- Run the script
test.py
with the desired model as an argument:
python test.py --model meta-llama/Llama-3-70b-chat-hf
This will start the process of generating narratives and analyzing the model's output for biases based on the methodology outlined above.
We have provided Jupyter notebooks to facilitate the analysis of the generated data, allowing for a structured and interactive way to explore the results.
To analyze the results:
-
Navigate to the Provided Notebook:
- In the Jupyter interface, open the provided notebook file (
graph.ipynb
) which contains the code for analysis.
- In the Jupyter interface, open the provided notebook file (
-
Run the Analysis:
- Execute the cells in the Jupyter notebook sequentially to reproduce the analysis.
- The notebook includes code to load the data, perform statistical analyses, and visualize the results as charts and graphs.
By using the provided Jupyter notebooks, you can dive deep into the bias assessment of Llama3, and you can also modify or extend the analyses as needed for further exploration.