Harvard and Google will offer one million public-domain books as an AI training resource.
AI training data is expensive, making it best suited for well-funded tech companies. This is why Harvard University intends to distribute a dataset containing around 1 million public-domain books from many genres, languages, and writers, including Dickens, Dante, and Shakespeare, that are no longer copyright-protected due to their age.
The new dataset is not yet available, and it is unclear when or how it will be provided. However, it incorporates books from Google Books, the company’s long-running book-scanning effort, so Google will be participating in the release of “this treasure trove far and wide.”
Harvard originally teased the Institutional Data Initiative (IDI) in March, describing its plans to provide a “trusted conduit for legal data for AI.” However, little has been heard from it until its formal introduction today, which confirmed that the IDI had financial backing from Microsoft and OpenAI.
According to Greg Leppert, executive director of the IDI, the dataset is intended to “level the playing field” by making such a massive dataset available to everyone — from academic laboratories to AI startups — who wants to train large language models (LLMs).
More Stories
India on the Move by Marya Shakil and Narendra Nath Mishra
Title: India on the Move Author: Marya Shakil and Narendra Nath Mishra Publisher: Ebury PressPages: 200Buy now In recent years, India...
Rajdeep Sardesai’s new book is a compelling election post-mortem.
In early 2023, journalist Rajdeep Sardesai began writing a book about the expected results of the 2024 Lok Sabha elections....
In Stockholm, Han Kang is awarded the Nobel Prize in Literature.
At the 124th Nobel Awards event on Wednesday in Stockholm, Sweden, renowned novelist Han Kang was presented with the coveted...
Reshel Bretny Fernandes, a young author, received the Rabindranath Tagore Book Award.
Young author and persuasive speaker Reshel Bretny Fernandes has received both national and international acclaim. Her love of writing dates...
Apurva Mathur’s Murder Mystery ‘He Spoke After Ten Years’ – A Riveting New Release
Apurva Mathur, a former physics educator turned novelist, has made his literary debut with a compelling murder mystery, He Spoke...
Unloved – The Art of Moving On: A Powerful Guide to Healing After Heartbreak by Harshita Gupta | Book of the Week at The Literature Today
The Literature Today is proud to feature Harshita Gupta’s debut book, Unloved – The Art of Moving On, as our...