Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
Book Review: Nation in Chaos – Three Layers of Truth by Kundan Singh Rajput
Title: Nation in Chaos – Three Layers of Truth Author: Kundan Singh RajputPages: 211Publisher: Astitva Prakashan Buy now Nation in Chaos...
Sonu Sharma Unveils ’24 Chapters of Success’: A Transformative Blueprint for Personal and National Growth
Title: 24 Chapters of SuccessAuthor: Sonu Sharma Pages: 356Publisher: AIETS.COM PVT. LTD.Buy now Sonu Sharma Launches His New Book “24...
Landour Network Welcomed Its First Literature & Arts Festival
The hill town of Mussoorie came alive on Sunday with the launch of the very first Landour Literature and Arts...
Sheetal Devi Makes History at Para World Archery Championship
Indian para-archer Sheetal Devi, just 18 years old, has scripted history by winning gold in the women’s compound individual event...
Book Review: AI DRIVEN LEADERSHIP: Leading with Dharma in the age of AI – Adapt Accelerate Amplify by Author Kuruva Venkataramana Murthy
Title: AI DRIVEN LEADERSHIP: Leading with Dharma in the age of AI – Adapt Accelerate AmplifyAuthor: Kuruva Venkataramana MurthyPages: Astitva...
Book Review- Ugesh Sarcar’s What Matters (Volume One: Credibility)
Title: WHAT MATTERS (VOLUME ONE: CREDIBILITY) Author: Ugesh SarcarBuy now Some books entertain, books that educate, and books that inspire. But...