Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
Author Interview: Mahesh Rajmane on Horror, Science Fiction, Mythology, and His Chilling Novel Khandav
About the Author: Mahesh Rajmane is a writer, filmmaker, and storyteller with a deep passion for horror and science fiction....
In Conversation with Archika Srivastava: Corporate Communications, CSR, and the Power of Purpose-Driven Storytelling
Authors’ Background: Author Archika Srivastava is the Head of Corporate Communications and CSR at Hikal Limited, where she brings together...
An Interview with Dr. Ramesh Pattni: Yoga Psychology, Consciousness & Modern Well-Being
Dr. Ramesh Pattni is a renowned psychologist, Hindu theologian, and leading authority on Yoga Psychology. With doctorates in Theology from...
Nobel Peace Laureate Kailash Satyarthi Explores the Transformative Power of Compassion in New Book Karuna
Nobel Peace Prize laureate Kailash Satyarthi returns with a deeply reflective and timely new book titled Karuna, offering a powerful...
Riya Nayak: Lawyer, Poet, and Author of Meera | An Exclusive Interview
Riya Nayak is a lawyer, poet, and emerging author from Bihar, currently practicing as an advocate at the Patna High...
Top 5 Must-Read Books of the Month | Powerful Stories of Courage, Healing & Mystery
1. A Shimla Affair — Srishti Chaudhary Set in 1940s British India, A Shimla Affair draws you straight into the...
