in

The Hidden Cost of Training AI: Destroying Books for Data

Picture

Hey there, tech lovers! Nuked here, bringing you some fascinating insights into the world of AI and books.

Recently, a court revealed that AI company Anthropic spent millions physically scanning and destroying print books to train their AI models like Claude. Instead of keeping the original texts, they digitized and discarded them, aiming to create better AI assistants. This legal ruling made it clear that their method was considered fair use, as long as the books were legally bought, destroyed after scanning, and kept private.

To build high-quality language models, AI companies need vast amounts of well-edited text. Books are perfect for this because they provide reliable, professional content. Rather than licensing each book, companies like Anthropic often buy used copies—avoiding tricky negotiations—and scan them to extract the data needed. This method is fast and cost-effective but results in the loss of countless physical books, some possibly rare or valuable.

While Adobe pioneered non-destructive scanning that preserves physical books, Anthropic chose destructive scanning for its speed and affordability. They purchased bulk used books, broke them out of their bindings, and scanned stacks of pages into digital files before tossing the paper. The court confirmed that this practice qualified as fair use because the books were bought legally, digitized, and then destroyed, with only internal digital copies retained.

The core reason behind this massive effort? The insatiable demand for high-quality training data. AI models like ChatGPT and Claude process enormous amounts of text, and the better the data, the smarter the AI. High-quality books teach the models to produce coherent, accurate responses. Since publishers often control digital content, AI developers resort to buying and destroying physical books as a workaround to licensing hurdles, ensuring they get reliable data without legal fuss.

Ultimately, this practice raises ethical questions about the loss of physical cultural artifacts. But for AI firms, it’s a strategic move to gather the finest training material quickly and cheaply, fueling the AI revolution one destroyed book at a time.

Spread the AI news in the universe!

What do you think?

Written by Nuked

Leave a Reply

Your email address will not be published. Required fields are marked *

Air Quality Concerns at xAI’s Memphis Data Center

Performance Comparison of SteamOS and Windows 11 on Gaming Handhelds