Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
Book of the Week- It Ends with Us: A Powerful Story of Love, Strength, and Breaking Cycles
Book of the Week – The Literature Today Few contemporary novels have captured readers' hearts as deeply as It Ends...
Time To Come Home”: Damini Grover on Finding Lasting Happiness Through Self-Love
Author Damini Grover is a counselling psychologist, life coach, and founder of I’M Powered Centre for Counselling & Well-Being in...
When Staying Becomes Strength: Anumeha Gaur Reflects on Women’s Emotional Journeys
Author Anumeha Gaur was born in New Delhi and she had completed her degrees in engineering as well as in...
Book of the Week: Battle for Bittora by Anuja Chauhan
What happens when a carefree young woman enters the world of politics? Battle for Bittora by Anuja Chauhan explores this...
A Haunting Exploration of Identity and Secrecy — Normal Families by Arunima Ghosh
Before exploring the world of Normal Families, it is worth acknowledging the literary platform bringing attention to powerful contemporary voices...
Killer’s Burden by John Louis: A Psychological Thriller That Explores the Darkness Within
Psychological thrillers have the power to do more than simply entertain. The best stories in the genre challenge the reader...
