Shifting the Frame: The Labors of ImageNet and AI Data

ABSTRACT

Artificial intelligence (AI) technologies like ChatGPT, Stable Diffusion, and LaMDA have led a multi-billion dollar industry in generative AI, and a potentially much larger industry in AI more generally. However, these technologies would not exist were it not for the immense amount of data mined to make them run, low-paid and exploited annotation labor required for labeling and content moderation, and questionable arrangements around consent to use these data. Although datasets used to train and evaluate commercial models are often obscured from view under the shroud of trade secrecy, we can learn a great deal about these systems by interrogating certain publicly available datasets which are considered foundational in academic AI research.

In this talk, I investigate a single dataset, ImageNet. It is not an understatement to say that without ImageNet, we may not have the current wave of deep learning techniques which power nearly all modern AI technologies. I begin from three vantage points: the histories of ImageNet from the perspective of its curators and its linguistic predecessor WordNet, the testimony of the data annotators which labeled millions of ImageNet images, and the data subjects and the creators of the images within ImageNet. Academically, I situate this analysis within a larger theory and practice of infrastructure studies. Practically, I point to a vision for technology which is not based on practices of unrestricted data mining, exploited labor, and the use of images without meaningful consent.

BIO

Dr. Alex Hanna is Director of Research at the Distributed AI Research Institute (DAIR). A sociologist by training, her work centers on the data used in new computational technologies, and the ways in which these data exacerbate racial, gender, and class inequality. She also works in the area of social movements, focusing on the dynamics of anti-racist campus protest in the US and Canada. She holds a BS in Computer Science and Mathematics and a BA in Sociology from Purdue University, and an MS and a PhD in Sociology from the University of Wisconsin-Madison.

Dr. Hanna has published widely in top-tier venues across the social sciences, including the journals Mobilization, American Behavioral Scientist, and Big Data & Society, and top-tier computer science conferences such as CSCW, FAccT, and NeurIPS. Dr. Hanna serves as a Senior Fellow at the Center for Applied Transgender Studies, and sits on the advisory board for the Human Rights Data Analysis Group and the Scholars Council for the UCLA Center for Critical Internet Inquiry.

She is a recipient of the Wisconsin Alumni Association’s Forward Award, has been included on FastCompany’s Queer 50 and Go Magazine’s Women We Love lists, and has been featured in the Cal Academy of Sciences New Science exhibit, which highlights queer and trans scientists of color.

With Emily M. Bender, Dr. Hanna runs the Mystery AI Hype Theater 3000 series, playfully and wickedly tearing apart AI hype for a live audience online on Twitch and on their podcast.

ABSTRACT

BIO

Become a USC CAIS partner through community projects, funding, volunteering, or research collaboration.