Skip to content
Armeet Jatyani edited this page May 16, 2021 · 1 revision

Algorithmic Bias Analysis

FacePlusPlus Beauty Scores

Armeet Singh Jatyani
San Jose City College

1. Abstract

This paper identifies and evaluates the extent towhich algorithmic bias has an effect on perceived beauty scores by the popular facial recognitionservice: FacePlusPlus. Algorithmic bias is described as “systematic and repeatable errorin a computer system that creates unfair outcomes, such as privileging one arbitrary groupof users over others.”

2. Research Procedure

Selection of Regions

Research was conducted on five different regions inSouth-Asia: Northern region (Punjab), Northwest (Rajasthan), South (Telangana), Deep South(Tamil Nadu), and West (Gujrat). These regions were selected for differences in skin tone,facial features, and cultural appearances. Evaluating the FacePlusPlus facial recognition serviceusing a varied dataset maximized the effectiveness of the study.

Image Sampling

Images were scraped from the internet and compiledinto folders. In total, one thousand images were sampled for this study. Two hundred images weresampled for each regional group. Half of these images were male; the other half were female.To reduce the probability of any confounding factors skewing results, the followingprecautions were taken. Images were of varying quality and clarity. Images sampled were withand without glasses, head-dresses, and

different clothing. Images of fully visible and partially visible faces were both used. Finally, images were sampled from all age ranges. Below area few sample images from the dataset used in this research.

Scripting

FacePlusPlus has an API which allows developers to submit images to the facial recognition service and return a variety of detected attributesof that image. The casual reader may test this beauty analyzer using the official website demo:https://www.faceplusplus.com/beauty/.These attributes include the number of faces in the image,emotion of the shown face, gender, and beauty scores. For the purpose of this paper, we willfocus on the predicted beauty scores returned by the FacePlusPlus API service. All scriptsin this project were written using Python. First, scripts were written to randomize and nameall the dataset images. Next, scripts were used to pass each image to the FacePlusPlus API, and savethe responses. Finally, scripts were used to parse each of the JSON response files, convertingthem into CSV files. These CSV files were finally imported into a spreadsheet program to visualize.Additionally, the average beauty score was calculated for each region, by taking the meanof all two hundred beauty scores given by the FacePlusPlus API.

3. Data

Figure A - Beauty Scores from the South Telangana Region

SOUTH_MALE id male_beauty_score female_beauty_score
0 1 70.984 69.
0 2 64.726 63.
0 5 60.992 62.

Table A shows fifteen of two hundred responses givenby the FacePlusPlus API. The table shows

the ID of each individual, male beauty score, andfemale beauty score. Specifically, this table

  • 0 6 47.615 57.
  • 0 7 66.354 72.
  • 0 8 57.082 64.
  • 0 9 69.237 74.
  • 0 10 39.36 47.
  • 0 11 73.045 67.
  • 0 12 72.634 72.
  • 0 13 58.759 57.
  • 0 14 57.802 55.
  • 0 15 56.9 56.

Figure B - Beauty Score vs. Region (By Gender)

In Figure B, blue bars represent the average beautyscore of male individuals, while red bars represent the average beauty scores of female individuals.

Figure C - Beauty Score vs. Region (Averages)

In Figure C, we observe the average beauty scores(male and female) of all five regions.

4. Results

Looking at Figure C, we observe that the five differentregions had varying beauty scores, as determined by the FacePlusPlus API. Theoretically,beauty should be normally distributed in every region. A “perfect” algorithm would yield equalaverage beauty scores throughout the regions. The region with the highest average beautyscore was 17.2% higher than the region with the lowest average beauty score. The presence of thisdifference is significant, and reflects the presence of a bias in the FacePlusPlus beauty algorithm.Additionally, across four of the five studied regions, we observed that the average beautyscore for males was higher than the average beauty score for females.

5. Discussion/Conclusion

This study/paper found a significant difference betweenthe average beauty scores, as determined by FacePlusPlus’s facial recognition machine learningengine API service. The presence of this difference reflects a significant bias. The beautyscores returned by FacePlusPlus are percentiles. Thus, even a 2-3% difference in beauty scores throughoutregions is massive. Although “beauty” is subjective, it is also relative, and thereforeit must be normally distributed. A majority of individuals will have an “average” beauty score. Aselect few will be extremely beautiful or less beautiful than average. This is consistent acrossall genders, ethnicities, and regions. Thus, the best way to address this issue would be to determinebeauty scores, by training machine learning algorithms to consider ethnicity or features thatare “deemed beautiful” specific to each region. Additionally, if there is an idea of “general/commonbeauty” that is consistent amongst all ethnicities, regions, and sexes, FacePlusPlus shouldbe more explicit in describing the features that the algorithm uses to determine this score. Onlythen can steps be made to reduce the algorithmic bias present in these facial recognitionengines. The implications of this technology must also be factored. Many countries, such as China,already use beauty softwares to rank employees. Facial recognition is a powerful technology,yet algorithmic bias must be addressed to protect the general public from discriminatorytreatment and “coded benefits” towards certain groups.