Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
10 year old becomes Amazon bestseller with Reptile Genre
A 10-year-old from Hull has become an Amazon best-seller after releasing his very first children’s book. Zach Richardson’s title, Zach’s...
Kiran Desai Returns to Booker Prize Longlist with her New Novel
LONDON — Novelist Kiran Desai is back in the running for the Booker Prize with her forthcoming work The Loneliness...
Manu Bhaker wons Another Medal to Her Glorious Career with Bronze at Asian Shooting Championship
Ace Indian shooter Manu Bhaker once again proved her on the international stage by clinching a bronze medal in the...
Book Review: Mindful Momentum — Navigating Procrastination And Overthinking
Title: Mindful Momentum — Navigating Procrastination And OverthinkingAuthor: Sushant RajputPages: 182Publisher: Bluerose Publishers Buy now In a world where our minds...
Exclusive Interview with Sushant Rajput
TLT: Your first book focused on preparing young professionals for their careers, while this title, “Mindful Momentum” explores procrastination and...
Tamil translated Stories on Global Stage
This anthology of 22 short stories, translated from Tamil, invites readers to reflect deeply on questions of identity, belonging, and...