Reddit reportedly selling its users' content to an AI company for $60 million per year

(Image credit: Reddit)

According to a report from Bloomberg, Reddit has signed a content licensing deal allowing AI models to "train" on its users' submitted data to the sum of $60 mil every year. Officially, Reddit has declined to comment on the matter, but the timing of it aligns with expectations of its first Initial Public Offering (IPO) in the stock market in the coming months.

As expected, the move has been met with scrutiny and backlash in the hours since the story first dropped, but it isn't clear what recourse people reluctant to share their comments with AI engines have, at least outside of some direct legal action. Social media platforms like Reddit monetize user data all the time— however, the long-term viability of generative AI without legal challenges seems questionable. The viability becomes even murkier on a platform like Reddit, where copyrighted or even pirated content is often posted, even if it gets taken down.

Since the purported AI content licensing deal (and Reddit itself) has yet to be made public, there is still a chance that Reddit may decide against implementing it, or the final sum may be significantly different.

However, Reddit's actions could potentially encourage similar moves from other social media platforms. This would be a major headache to artists and other such users who don't want their work used to train AI models that many say resemble automated content theft machines more than intelligent entities.

According to a previous Washington Post report and an anonymous source, Reddit has previously expressed willingness to cut off search engines from Reddit posts, declaring the service "can survive without search." That threat was reportedly because Reddit wanted to sell AI training data.

TOPICS

Christopher Harper has been a successful freelance tech writer specializing in PC hardware and gaming since 2015, and ghostwrote for various B2B clients in High School before that. Outside of work, Christopher is best known to friends and rivals as an active competitive player in various eSports (particularly fighting games and arena shooters) and a purveyor of music ranging from Jimi Hendrix to Killer Mike to the Sonic Adventure 2 soundtrack.

18 Comments Comment from the forums

COLGeek

Given the nature of much of the hosted content, I question what of value the models will "learn".

Aside from the questionable technical goodness of the data, what may be more interesting is how the legal aspects of these transactions evolve over time.

Good awareness article.
Reply
USAFRet

COLGeek said:
Given the nature of much of the hosted content, I question what of value the models will "learn".
Garbage in, garbage out.
Reply
Giroro

"Creators" never own their content. The platforms do.

If you have a problem with that, then good luck raising the few dozen million dollars you'll need to get regulatory approval to start your own server farm. I hear the strict environmental regulations big tech keeps pushing can be a real challenge for startups to navigate.

But it will be worth it in the end when you finally have a place you can freely post your personal pictures and tell people what you *really* think.
Reply
vanadiel007

Bablabibbloo boo. There, use that for training.
Reply
ThomasKinsley

Reddit has become the butt of jokes. Are AI companies sure they want to copy this?

JzU_5YoSegUView: https://www.youtube.com/shorts/JzU_5YoSegU
Reply
bit_user

I don't really mind if my github projects are being used to train AI. I opensourced that code for the benefit of others, so I don't care too much whether they benefit by using it directly, or via an AI service of some kind.

I sure hope nobody is using posts from these forums to train AI models...
: O
Actually, I think the posts marked "Best Answer" might not be a terrible way to educate an AI about general PC troubleshooting, but even those aren't consistently great. For sure, I'd filter out the rest of the posts...
Reply
randyh121

An AI trained on reddit posts is nothing new
Reply
DavidLejdar

Giroro said:
"Creators" never own their content. The platforms do.

If you have a problem with that, then good luck raising the few dozen million dollars you'll need to get regulatory approval to start your own server farm. I hear the strict environmental regulations big tech keeps pushing can be a real challenge for startups to navigate.
...
That is not correct. I.e. Taylor Swift songs are not YouTube's (/Google's/Alphabet's) property. The terms and conditions of such sites, usually stipulate some details about what licence the creator gives to YouTube. In this case in details, see: https://www.youtube.com/t/terms#27dc3bf5d9
But yeah, if there is something, such as in the case of Reddit, a creator does not want, then they sure do not have to use it. And it actually isn't that difficult to set up some hosting - the exposure is then a bit different topic though.
Reply
Findecanor

COLGeek said:
Given the nature of much of the hosted content, I question what of value the models will "learn".
In my view, there is great diversity between "subreddits" (subforums) on Reddit. Some are mostly silly, others are very serious. Different ones have different rules of conduct, and different tones.

I would question the value of the posts as is though. There would need to be some kind of filter, perhaps based on an already highly trained model, to even start understanding how to train on forum posts.
Reply
Findecanor

Giroro said:
"Creators" never own their content. The platforms do.
AFAIK, international copyright laws dictate that both the web site and the user owns copyright on each post. Neither can decide for the other what the other does with their copy of a post.
I'm allowed to collect my Reddit posts, and sell e.g. printed books with them if I want. But so is Reddit.

... Unless a post consists of something that was already under copyright owned by someone else, say: song lyrics, an image, a video, etc.
Reply

Show more comments