Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

(Image credit: OpenAI)

Stack Overflow, a legendary internet forum for programmers and developers, is coming under heavy fire from its users after it announced it was partnering with OpenAI to scrub the site's forum posts to train ChatGPT. Many users are removing or editing their questions and answers to prevent them from being used to train AI — decisions which have been punished with bans from the site's moderators.

Stack Overflow user Ben posted on Mastodon about his experience editing his most successful answers to try to avoid having his work stolen by OpenAI.

@ben on Mastodon posts, "Stack Overflow announced that they are partnering with OpenAI, so I tried to delete my highest-rated answers. Stack Overflow does not let you delete questions that have accepted answers and many upvotes because it would remove knowledge from the community. So instead I changed my highest-rated answers to a protest message. Within an hour mods had changed the questions back and suspended my account for 7 days." — (Image credit: @ben@m.benui.ca)

Ben continues in his thread, "[The moderator crackdown is] just a reminder that anything you post on any of these platforms can and will be used for profit. It's just a matter of time until all your messages on Discord, Twitter etc. are scraped, fed into a model and sold back to you."

Harsh words, but words that ring true with fellow Stack Overflow users who are joining the post protest. Users are also asking why ChatGPT could not simply share the source of the answers it will dispense in this new partnership, both citing its sources and adding credibility to the tool. Of course, this would reveal how the sausage of LLMs is made, and would not look like the shiny, super-smart generative AI assistant of the future promised to users and investors.

Site moderators preventing high-popularity posts from being deleted is legally above-board. Angry users claim they are enabled to delete their own content from the site through the "right to forget," a common name for a legal right most effectively codified into law through the EU's General Data Protection Regulation (GDPR). Among other things, the act protects the ability of the consumer to delete their own data from a website, and to have data about them removed upon request. However, Stack Overflow's Terms of Service contains a clause carving out Stack Overflow's irrevocable ownership of all content subscribers provide to the site.

Users who disagree with having their content scraped by ChatGPT are particularly outraged by Stack Overflow's rapid flip-flop on its policy concerning generative AI. For years, the site had a standing policy that prevented the use of generative AI in writing or rewording any questions or answers posted. Moderators were allowed and encouraged to use AI-detection software when reviewing posts.

Beginning last week, however, the company began a rapid about-face in its public policy towards AI. CEO Prashanth Chandrasekar spent his quarterly blog post praising the merits of generative AI, saying "the rise of GenAI is a big opportunity for Stack." Moderators were quickly (and somewhat informally) instructed to cease removal of AI-generated questions and answers on the forum.

Stack is not alone in reversing a principled stance on AI for profit; Valve also silently removed its AI-art ban on Steam, allowing over 1,000 AI-powered games to flood the storefront. Stack Overflow's partnership with OpenAI also follows the LLM company's recent push for increased partnerships and marquee deals, including their major announcement of a $100 billion datacenter to be built with Microsoft.

The rampant chasing of money in the insanely-profitable AI marketplace is exciting, but should be tempered; AI may consume a quarter of the U.S.'s power grid by just 2030, according to reports from industry professionals and agencies.

TOPICS

Sunny Grimm is a contributing writer for Tom's Hardware. He has been building and breaking computers since 2017, serving as the resident youngster at Tom's. From APUs to RGB, Sunny has a handle on all the latest tech news.

92 Comments Comment from the forums

Notton

It's feudalism, but with a techno touch to it.
All of your data is now capital for someone else because it just so happens to be sitting on their server.
Reply
JamesJones44

There have been lawsuits for copy and pasting from StackOverflow. Granted these cases apply to when a someone copy and pasted code from a non permissive license product to StackOverflow and then another user took that code from StackOverflow and then copy and pasted into their product, but it seems like the same risks would apply to LLM/Gemini/ChatGPT.

The terms about how the content can be used by the StackOverflow and the various other stack exchange sites is fairly clear that the content is covered under Creative Commons. I would be curious to know if ownership in the way the posters want could be granted, seems unlikely to me, but I would need a lawyer to answer that one. However, whether the content scraped from the site or done from a backend database it don't think it really matters under Creative Commons, the data is free to be seen (NOTE: I'm saying seen, not used. Used depends on where that code snippet came from and the ability to prove you didn't copy it from a licensed product).
Reply
fball922

All answers on SO should now be generated from AI so that the training will result in hallucinations :D
Reply
ThomasKinsley

Banning users for editing their own posts is archaic. The most creative users will now leave the platform and create their own places to share their knowledge. Doesn't Stack Overflow realize its users were its best asset? AI cannot replace them.
Reply
bit_user

I don't really see why people are so outraged over their public posts being used like this. When you post something on a public forum, especially a privately-owned one, you're basically giving it away and all control over it.

It would be different if these were presumed private communications, but there was clearly no pretense of that ever being the case.
Reply
bit_user

Notton said:
All of your data is now capital for someone else because it just so happens to be sitting on their server.
It's sitting there because you put it there!

This isn't like a search or advertising giant collecting stats on people's web usage patterns, or credit reporting agencies collecting data on your spending habits, this is a site you voluntarily made an account on and posted to! Furthermore, they're not asserting control over anything of yours you didn't actively upload to them!
Reply
CmdrShepard

bit_user said:
I don't really see why people are so outraged over their public posts being used like this.
Because the original purpose of your posts was to benefit other fellow human beings, not to enrich some soulless corporate entity?

bit_user said:
When you post something on a public forum, especially a privately-owned one, you're basically giving it away and all control over it.
If I am not mistaken, the established law is that you have and retain copyright over what you created and you aren't automatically giving that copyright up by publishing.

To elaborate a bit further why people are enraged:

When a developer posts an answer on SO they are giving back to the community for all those times they got an answer from there.

Nobody posting code answers on SO ever thought to include a usage license -- it looks now they should have but it's easy to be smart now after the fact.

I don't mind SO monetizing my answers by putting ads next to them like other public platforms do.

What I do mind is a 3rd party paying a pittance (if that) for those answers in bulk and being able to use them to make a giant recurring profit by selling a service based on something I did. Even if they are paying SO, it's still theft -- they are stealing from me and using it not for one-off use like other fellow developers, but for perpetual revenue generation.

I did it for free for other people who did it for free for me. OpenAI never did anything for me for free, not to mention their name is a biggest lie ever as neither their code nor their models are open-source.
Reply
Maltrer

Great article and thanks for apparently not using gpt to write it.
Reply
bit_user

CmdrShepard said:
Because the original purpose of your posts was to benefit other fellow human beings, not to enrich some soulless corporate entity?
Regardless of their intent, people posting on there should've been smart enough to know what it means to use a platform like that.

CmdrShepard said:
If I am not mistaken, the established law is that you have and retain copyright over what you created and you aren't automatically giving that copyright up by publishing.
According to what others have said, in the ToS agreement you license your content to them under Creative Commons (?), which means giving up most of the control over what happens after that.

If you indeed retain copyright, then you're free to dual-license your content to another party or in another context, but you can't retroactively revoke the Creative Commons license on stuff you published, after it's already out there. Anyone who managed to get a copy of the content, while it was covered by Creative Commons, can continue to use it under those terms. And that includes the platform, themselves.

CmdrShepard said:
To elaborate a bit further why people are enraged:
Sorry, in a colloquial turn of phrase, I happened to have mischaracterized my bewilderment. I actually do understand why they're upset. I guess what I find surprising is how many didn't appreciate the implications of posting on the internet, not even to speak of a privately-owned platform. Same with Reddit. Also Github.

And don't think we're immune from that on here, either. In fact, the main concern I have with the idea of someone using these forum posts to train AI is not that it'll misappropriate my knowledge, but feeding it all the other garbage that so many people post on here! Not anyone in this thread, of course - just those other posters!
😅

CmdrShepard said:
What I do mind is a 3rd party paying a pittance (if that) for those answers in bulk and being able to use them to make a giant recurring profit by selling a service based on something I did.
Eh, so very little of what's in our brains is truly original. The vast majority of it we're regurgitating, just like AI. It just so happens that AI is more efficient at it than we are. Before long, it'll be better, too.
Reply
USAFRet

And early on, most of the Stacks banned AI answers (I agree with this).
The vast majority were/are low quality crap.

Now...it is that they can use it, but users cannot?
Reply

Show more comments