Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
Many Ramayanas, Many Lessons By Anand Neelakantan
Title: Many Ramayanas, Many LessonsAuthor: Anand NeelakantanPublisher: Harper Non-Fiction IndiaPages: 456Buy now Exploring the Enduring Legacy of the Ramayana In...
Wild Fictions: Amitav Ghosh’s Essays Exploring the Complexities of Our World
One of the most celebrated writers of our time, Amitav Ghosh, presents us with a compelling compilation of essays in...
After a 36-year prohibition, Salman Rushdie’s The Satanic Verses is available in Delhi bookstores.
The contentious book "The Satanic Verses" by British-Indian author Salman Rushdie has subtly made a comeback to India 36 years...
Penguin Random House Unveils The Penguin Nehru Library
The Penguin Nehru Library, a definitive collection honoring the life, career, and ideas of Jawaharlal Nehru, India's first prime minister,...
Pseudo Ecotourism in the Shadow of the Bengal Tiger: A Literary Triumph -Arnab Basu
Arnab Basu's thought-provoking book, Pseudo Ecotourism in the Shadow of the Bengal Tiger, has been honored with the prestigious Kolkata...
Comprehending Divinity: A Quantum Leap into Divine Intelligence by Dr. Meena Patel – A Profound Exploration of Spiritual Intelligence
In “Comprehending Divinity: Quantum Leap to Divine Intelligence”, Dr. Meena Patel presents a fascinating intersection of spiritual teachings and scientific...