Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
The 11 Spiritual Roles of the Soul — Discovering Your True Purpose
Many people search for a deeper meaning in life. Some feel they are meant for something greater but do not...
Han Kang’s The Vegetarian Named International Booker Prize Favourite of the Decade
South Korean author Han Kang’s acclaimed novel The Vegetarian has been voted the Favourite International Booker Prize-winning book of the...
Manoj Bajpayee’s Memoir Headlines Penguin Random House India & Kuku FM’s New Hindi Audiobook Initiative
Manoj Bajpayee’s memoir has emerged as one of the key highlights in a newly announced collaboration between Penguin Random House...
The Desert Craft Journey: Kutch & Barmer Beadwork Heads to London
For centuries, the desert communities of Kutch and Barmer have carried forward an intricate beadwork tradition - one rooted in...
Daadi Ki Shaadi Explores Tradition, Widowhood, and Second Chances in a Family Drama Led by Neetu Kapoor and Kapil Sharma
Indian cinema has long portrayed elderly women — particularly widows — through narratives centered on sacrifice, silence, and social invisibility....
In Conversation with Dr. Smruti Ranjan Nayak – EXCLUSIVE AUTHOR INTERVIEW ON THE COSMIC SYMPHONY
About the Author - Dr. Smruti Ranjan Nayak is an internationally educated management professional, author, and thought leader with academic...
