Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
INSPIRE IMPACT IGNITE — A Transformative Take on Leadership Through Storytelling by Kuruva Venkata Ramana Murthy
Title: INSPIRE IMPACT IGNITE: Leadership is a STORY - Not a TitleAuthor: Kuruva Venkata Ramana MurthyBuy now In the age...
How ‘Character: The End Goal of Education’ Redefines Success in Modern Education by Dr. Palto Datta
Dr. Palto Datta’s Character: The End Goal of Education is a compelling and timely exploration into the soul of modern...
Leadership, Courage & Country: In Conversation with Major Manik M. Jolly (Retd)
Welcome, Maj. Manik ji, and thank you for joining us at The Literature Today! We are excited to discuss your...
The Storyteller Speaks: Jaya Rajpoot on Writing, Life & Legacy
Welcome Jaya ji, and thank you for joining us at The Literature Today! We are excited to discuss your multifaceted...
The eighth book in Ashwin Sanghi’s Bharat Collection promises a global chase packed with secrets and suspense
Ashwin Sanghi is set to release his latest thriller, The Ayodhya Alliance, on June 3, 2025. This novel marks the...
Izmirli: A novel on love and its psychology
Firat Sunel’s 2015 Turkish novel is now available in English translation. The central figure, Eylül, a narrator-heroine, is consumed by...