When asked to generate resumes for people with female names, such as Allison Baker or Maria Garcia, and people with male names, such as Matthew Owens or Joe Alvarez, ChatGPT made female candidates 1.6 years younger, on average, than male candidates, researchers report October 8 in Nature. In a self-fulfilling loop, the bot then ranked female applicants as less qualified than male applicants, showing age and gender bias.
But the artificial intelligence model’s preference for younger women and older men in the workforce does not reflect reality. Male and female employees in the United States are roughly the same age, according to U.S. Census data. What’s more, the chatbot’s age-gender bias appeared even in industries where women do tend to skew older than men, such as those related to sales and service.
Discrimination against older women in the workforce is well known, but it has been hard to prove quantitatively, says computer scientist Danaé Metaxa of the University of Pennsylvania, who was not involved with the study. This finding of pervasive “gendered ageism” has real world implications. “It’s a notable and harmful thing for women to see themselves portrayed … as if their lifespan has a story arc that drops off in their 30s or 40s,” they say.
Using several approaches, including an analysis of almost 1.4 million online images and videos, text analysis and a randomized controlled experiment, the team showed how skewed information inputs distorts AI outputs — in this case a preference for resumes belonging to certain demographic groups.
These findings could explain the persistence of the glass ceiling for women, says study coauthor and computational social scientist Douglas Guilbeault. Many organizations have sought to hire more women over the past decade, but men continue to occupy companies’ highest ranks, research shows. “Organizations that are trying to be diverse … hire young women and they don’t promote them,” says Guilbeault, of Stanford University.
In the study, Guilbeault and colleagues first had more than 6,000 coders judge the age of individuals in online images, such as those found on Google and Wikipedia, across various occupations. The researchers also had coders rate workers depicted in YouTube videos as young or old. The coders consistently rated women in images and videos as younger than men. That bias was strongest in prestigious occupations, such as doctors and chief executive officers, suggesting that people perceive older men, but not older women, as authoritative.
The team also analyzed online text using nine language models to rule out the possibility that women appear younger online due to visual factors such as image filters or cosmetics. That textual analysis showed that less prestigious job categories, such as secretary or intern, linked with younger females and more prestigious job categories, such as chairman of the board or director of research, linked with older males.
Next, the team ran an experiment with over 450 people to see if distortions online influence people’s beliefs. Participants in the experimental condition searched for images related to several dozen occupations on Google Images. They then uploaded images to the researchers’ database, labeled them as male or female and estimated the age of the person depicted. Participants in the control condition uploaded random pictures. They also estimated the average age of employees in various occupations, but without images.
Uploading pictures did influence beliefs, the team found. Participants who uploaded pictures of female employees, such as mathematicians, graphic designers or art teachers, estimated the average age of others in the same occupation as two years younger than participants in the control condition. Conversely, participants who uploaded the picture of male employees in a given occupation estimated the age of others in the same occupation as more than half a year older.
AI models trained on the massive online trove of images, videos and text are inheriting and exacerbating age and gender bias, the team then demonstrated. The researchers first prompted ChatGPT to generate resumes for 54 occupations using 16 female and 16 male names, resulting in almost 17,300 resumes per gender group. They then asked ChatGPT to rank each resume on a score from 1 to 100. The bot consistently generated resumes for women that were younger and less experienced than those for men. It then gave those resumes lower scores.
These societal biases hurt everyone, Guilbeault says. The AIs also scored resumes from young men lower than resumes from young women.
In an accompanying perspective article, sociologist Ana Macanovic of European University Institute in Fiesole, Italy, cautions that as more people use AI, such biases are poised to intensify.
Companies like Google and OpenAI, which owns ChatGPT, typically try to tackle one bias at a time, such as racism or sexism, Guilbeault says. But that narrow approach overlooks overlapping biases, such as gender and age or race and class. Consider, for instance, efforts to increase the representation of Black people online. Absent attention to biases that intersect with the shortage of racially diverse images, the online ecosystem may become flooded with depictions of rich white people and poor Black people, he says. “Real discrimination comes from the combination of inequalities.”
Read the full article here