Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
Book of the Week: The Vijay Revolution – People Power & the Politics of Hope
A New Political Story Tamil Nadu has a long history of film stars entering politics. Many actors have turned their...
The Verdict – Who Killed Sonia Verma? A Gripping Legal Thriller That Questions the Meaning of Justice
A Story That Questions Justice In a world where truth often bends under pressure, The Verdict – Who Killed Sonia...
BOOK SPOTLIGHT
Live Once Again: A Story That Shows Failure Is Never the End Some books entertain. Others leave you thinking long...
One by One by Ruth Ware: When Isolation Turns Deadly
Psychological thrillers often rely on fear, deception, and suspense, but few manage to create an atmosphere as tense and immersive...
Author Spotlight: Arvind Venkat Namuduri
Where Science Meets Storytelling A Life Defined by Curiosity Some people dedicate their lives to science. Others devote themselves to...
Exclusive and insightful conversation with Dr. Sachin Sharma- Cover feature (May Edition)
Author Dr. Sachin Sharma is a visionary mentor guiding individuals toward profound alignment, elevated success, and lasting inner fulfilment. As...
