More inclusivity since 2010 – but leading roles still favor white men
In a nutshell
- Computer analysis of 2,307 Hollywood films (1980-2022) shows women’s screentime increased from 25% to 40%, with female directors achieving gender parity while male directors averaged only 29% women’s screentime
- Black actors were significantly underrepresented in award-nominated films compared to box office hits, especially during 1990-2010, though representation has improved in the past decade
- While overall diversity is increasing, all groups except white men remain underrepresented in leading roles compared to supporting parts, suggesting meaningful representation requires looking beyond raw screentime numbers
BERKELEY, Calif. — Hollywood churns out hundreds of films each year, but who really gets meaningful screen time? Measuring actual representation has historically required scholars to watch hundreds of hours of footage and take meticulous notes. Now, researchers have revolutionized this process by letting computers do the heavy lifting to analyze over 2,300 movies and measure diversity with unprecedented precision.
Using an innovative approach nicknamed the “silicon gaze,” Berkeley researchers examined 2,307 movies amounting to 4,412 hours of footage and spanning both blockbusters and award nominees from 1980 to 2022. This massive undertaking was made possible by a recent federal regulation that loosened restrictions on DVD encryption, allowing researchers to analyze copyrighted films computationally for the first time in U.S. history.
The study, published in Proceedings of the National Academy of Sciences (PNAS), focused on two categories of films: “popular” movies (the top 50 box office earners each year) and “prestige” films (those nominated for Best Picture by major organizations like the Academy Awards and Golden Globes). This dual approach allowed researchers to compare representation between films that achieved commercial success and those that garnered critical acclaim.
“I see this work as complementary to human viewing, I think that if you have the capacity to go watch hundreds of movies as other studies have done, you should do that because those methods are likely going to be more accurate,” says study author David Bamman, associate professor at UC Berkeley’s School of Information, in a statement. “But automation can give us access to measurement at a much larger scale. We can apply validated computer vision methods to a much larger collection of films than we could possibly watch, and at a finer granularity than we could measure by hand.”
The data revealed persistent gender imbalance throughout most of the studied period. Men dominated screen time across both popular and prestige films, with women’s representation hovering around 25% for three decades. However, recent years have shown improvement, with women’s screen time increasing to 40% by 2022.
Films directed by women had an average of 50.1% screen time for women, while those directed by men showed a much lower average of 29%. Perhaps most telling, women directed just 4% of all films studied, highlighting how representation behind the camera affects what appears on screen.
Racial and ethnic representation showed similar patterns of historical disparity with recent improvement. Black, Hispanic/Latino, East Asian, and South Asian actors have gained more screen time over the past decade. However, the research uncovered a troubling trend in prestigious films: award-nominated movies consistently underrepresented Black actors compared to popular films, particularly during the 1990s and 2000s.
Modern movies have also become more diverse internally. Rather than diversity increasing only through films like “Black Panther” with predominantly non-white casts, researchers found a steady rise in scenes showing actors of different races and ethnicities together. This suggests a shift toward more integrated storytelling rather than segregated representation.
But this study goes deeper than just counting faces on screen. Researchers examined the prominence of roles, revealing that all groups except white men had less representation in leading roles than in supporting parts.
“We find that there is still greater diversity in non-leading roles than there is within the leading ones,” says Bamman. “This highlights one of the advantages of our approach. A lot of work that’s looked at representation for race and gender using manual methods has focused, by necessity, on the leading actors, but we see here that there is a lot more diversity as you go further down the cast list.”
Rather than using algorithms to determine actors’ race, gender, or ethnicity, they consulted Wikidata for gender information and surveyed viewers about their perceptions of actors’ racial/ethnic identities to ensure accuracy.
“The rationale for thinking about perceptions is that we want to try to approximate the representation that an average viewer sees on screen, and not try to infer anything about the identities of actors, which is unknowable outside of statements by the actors themselves,” explains Bamman.
This research required extensive collaboration between multiple UC Berkeley departments and a special exemption from federal copyright law that came with strict requirements. Institutions must own physical copies of all analyzed films, and the research must be conducted in secure computing environments. The Berkeley team purchased all 2,307 DVDs for the study, in compliance with federal copyright regulations. All analysis was conducted on UC Berkeley’s Secure Research Data and Compute (SRDC) platform, which meets HIPAA and FERPA security standards for handling sensitive data.
While the data shows encouraging trends toward greater diversity, it also reveals persistent disparities that suggest Hollywood still has significant work ahead in achieving equitable representation both on screen and behind the camera. Hollywood plays a major role in shaping cultural narratives, and this study provides concrete evidence of where change is still needed. Researchers can now analyze representation with unprecedented depth to set a new standard for measuring diversity in media.
Paper Summary
Methodology
Researchers analyzed 4,412 hours of footage across 2,307 films using computer vision technology to track actors’ faces throughout scenes. This was made possible by a 2021 federal exemption to the Digital Millennium Copyright Act, allowing higher education institutions to decrypt DVDs for research. The team matched faces to IMDB cast lists using AI models, consulted Wikidata for gender information, and conducted viewer surveys for racial/ethnic perception data. All analysis was performed on UC Berkeley’s Secure Research Data and Compute platform to maintain data security.
Results
Women’s screentime increased from 25% to 40% between 1980-2022. Female-directed films showed gender parity (50.1% women’s screentime) while male-directed films averaged 29% women’s screentime. Representation of Black, Hispanic/Latino, East Asian, and South Asian actors increased in the past decade. Black actors were significantly underrepresented in award-nominated films compared to box office hits, particularly from 1990-2010. All non-White groups and women had less representation in leading roles than supporting roles.
Limitations
The study focused on top box office films and award nominees, requiring institutions to own physical copies of analyzed films. The analysis relied on visible faces rather than total screentime or dialogue. The research was restricted by copyright laws, limiting data sharing. The study acknowledged some biases in the computer vision technology used. Specifically, the method slightly overcounted men, Hispanic/Latino, and South Asian actors, while undercounting women. However, these biases were small, with corrections changing prevalence rates by at most 1.6 percentage points. The study examined perceived rather than self-identified gender and racial/ethnic identity.
Discussion and Takeaways
The research provides quantitative evidence of Hollywood’s representation patterns while demonstrating the potential of computational analysis in film studies. The difference between female and male directors’ casting choices suggests increasing director diversity could improve representation. The disparity in leading roles indicates the need to examine role prominence, not just overall screentime. The study opens new possibilities for analyzing how actors are depicted and whether portrayals perpetuate stereotypes.
Funding and Disclosures
The research was supported by the Mellon Foundation and utilized UC Berkeley’s Secure Research Data and Compute Platform. The work involved collaboration between multiple UC Berkeley departments, including the School of Information, Library, and Samuelson Law, Technology and Policy Clinic.
Publication Information
This study was published in the Proceedings of the National Academy of Sciences (PNAS), Vol. 121, No. 46, on November 4, 2024. It is titled, “Measuring diversity in Hollywood through the large-scale computational analysis of film.” Authors include David Bamman, Rachael Samberg, Richard Jean So, and Naitian Zhou from UC Berkeley and McGill University.