Article Directory
Common Crawl: The Internet's Backdoor for AI's Copyright Infringement?
So, Common Crawl, huh? Never heard of 'em until today, but apparently, they're the reason AI companies are feasting on copyrighted content like it's an all-you-can-eat buffet. A "nonprofit" scraping the entire internet and handing it over to the likes of OpenAI and Google? Give me a freakin' break.
"Freely Available Content"? Yeah, Right.
Their website claims they only grab "freely available content" and avoid paywalls. But The Company Quietly Funneling Paywalled Articles to AI Developers investigation paints a different picture. A picture of them sneakily archiving articles from major news outlets – The New York Times, The Wall Street Journal, even The Atlantic itself – that people actually pay to read. So AI can train on it for free. And when publishers ask them to stop? Common Crawl apparently just shrugs and says, "Oops, can't delete anything. Our file format is, like, immutable."
Immutable? What is this, freakin' Fort Knox? They're acting like they're preserving the Rosetta Stone for future generations. They ain't. They're enabling copyright theft on a massive scale.
And their Executive Director, Rich Skrenta? He thinks robots should be allowed to "read the books" for free? "The robots are people too," he says. Seriously? I'm pretty sure my Roomba doesn't have the same rights as, you know, actual people. It's a vacuum cleaner. And these AI models ain't sentient beings; they're algorithms designed to make someone a boatload of money.
The AI Industry's "Fair Use" Excuse
Of course, the AI companies are all hiding behind the "fair use" defense. Classic. Anything to avoid paying for the content they're profiting from. It's like walking into a restaurant, eating a steak, and then telling the owner, "Nah, I'm good. It's fair use. I'm just... inspired by your cooking."

And Common Crawl is playing right along. They're not just providing the raw data; they're actively helping assemble AI-training datasets. Nvidia even thanked them for their advice! It's a cozy little relationship, ain't it? A few donations from OpenAI and Anthropic, and suddenly, they're best buds.
I gotta ask: are these the same companies that are making it harder and harder to tell what's real and what's fake online?
"Information Wants to Be Free"... For Whom?
Common Crawl is spouting that techno-libertarian garbage about "information wanting to be free." But Stewart Brand, the guy who coined that phrase, wasn't talking about corporations exploiting content for profit. He was talking about the cost of distribution, not the value of creation. There's a difference. A huge one.
Skrenta even had the audacity to tell The Atlantic that their articles aren't a crucial part of the internet. "Whatever you're saying, other people are saying too, on other sites," he said. What a slap in the face to journalists who are out there doing the hard work of reporting. Guess all that original reporting is worth nothing, huh?
I mean, maybe I'm overreacting. Maybe I'm just an old-fashioned dinosaur clinging to outdated notions of copyright and intellectual property. Maybe the future belongs to the AI overlords, and we should all just accept our fate as content fodder for their insatiable algorithms.
But something tells me this ain't right. This ain't fair. And it sure as hell ain't "open."
