Reddit reportedly selling its users' content to an AI company for $60 million per year

Current Reddit logo
(Image credit: Reddit)

According to a report from Bloomberg, Reddit has signed a content licensing deal allowing AI models to "train" on its users' submitted data to the sum of $60 mil every year. Officially, Reddit has declined to comment on the matter, but the timing of it aligns with expectations of its first Initial Public Offering (IPO) in the stock market in the coming months.

As expected, the move has been met with scrutiny and backlash in the hours since the story first dropped, but it isn't clear what recourse people reluctant to share their comments with AI engines have, at least outside of some direct legal action. Social media platforms like Reddit monetize user data all the time— however, the long-term viability of generative AI without legal challenges seems questionable. The viability becomes even murkier on a platform like Reddit, where copyrighted or even pirated content is often posted, even if it gets taken down.

Since the purported AI content licensing deal (and Reddit itself) has yet to be made public, there is still a chance that Reddit may decide against implementing it, or the final sum may be significantly different. 

However, Reddit's actions could potentially encourage similar moves from other social media platforms. This would be a major headache to artists and other such users who don't want their work used to train AI models that many say resemble automated content theft machines more than intelligent entities. 

Unfortunately, it seems likely that users on any platform will have no recourse until the law is truly "written" around the use of generative AI, copyrighted works therein, and so on. Major cases like The New York Times against OpenAI will ultimately determine the long-term fate of business arrangements such as this. 

According to a previous Washington Post report and an anonymous source, Reddit has previously expressed willingness to cut off search engines from Reddit posts, declaring the service "can survive without search." That threat was reportedly because Reddit wanted to sell AI training data.

  • COLGeek
    Given the nature of much of the hosted content, I question what of value the models will "learn".

    Aside from the questionable technical goodness of the data, what may be more interesting is how the legal aspects of these transactions evolve over time.

    Good awareness article.
    Reply
  • USAFRet
    COLGeek said:
    Given the nature of much of the hosted content, I question what of value the models will "learn".
    Garbage in, garbage out.
    Reply
  • Giroro
    "Creators" never own their content. The platforms do.

    If you have a problem with that, then good luck raising the few dozen million dollars you'll need to get regulatory approval to start your own server farm. I hear the strict environmental regulations big tech keeps pushing can be a real challenge for startups to navigate.

    But it will be worth it in the end when you finally have a place you can freely post your personal pictures and tell people what you *really* think.
    Reply
  • vanadiel007
    Bablabibbloo boo. There, use that for training.
    Reply
  • ThomasKinsley
    Reddit has become the butt of jokes. Are AI companies sure they want to copy this?

    JzU_5YoSegUView: https://www.youtube.com/shorts/JzU_5YoSegU
    Reply
  • bit_user
    I don't really mind if my github projects are being used to train AI. I opensourced that code for the benefit of others, so I don't care too much whether they benefit by using it directly, or via an AI service of some kind.

    I sure hope nobody is using posts from these forums to train AI models...
    : O
    Actually, I think the posts marked "Best Answer" might not be a terrible way to educate an AI about general PC troubleshooting, but even those aren't consistently great. For sure, I'd filter out the rest of the posts...
    Reply
  • randyh121
    An AI trained on reddit posts is nothing new
    Reply
  • DavidLejdar
    Giroro said:
    "Creators" never own their content. The platforms do.

    If you have a problem with that, then good luck raising the few dozen million dollars you'll need to get regulatory approval to start your own server farm. I hear the strict environmental regulations big tech keeps pushing can be a real challenge for startups to navigate.
    ...
    That is not correct. I.e. Taylor Swift songs are not YouTube's (/Google's/Alphabet's) property. The terms and conditions of such sites, usually stipulate some details about what licence the creator gives to YouTube. In this case in details, see: https://www.youtube.com/t/terms#27dc3bf5d9
    But yeah, if there is something, such as in the case of Reddit, a creator does not want, then they sure do not have to use it. And it actually isn't that difficult to set up some hosting - the exposure is then a bit different topic though.
    Reply
  • Findecanor
    COLGeek said:
    Given the nature of much of the hosted content, I question what of value the models will "learn".
    In my view, there is great diversity between "subreddits" (subforums) on Reddit. Some are mostly silly, others are very serious. Different ones have different rules of conduct, and different tones.

    I would question the value of the posts as is though. There would need to be some kind of filter, perhaps based on an already highly trained model, to even start understanding how to train on forum posts.
    Reply
  • Findecanor
    Giroro said:
    "Creators" never own their content. The platforms do.
    AFAIK, international copyright laws dictate that both the web site and the user owns copyright on each post. Neither can decide for the other what the other does with their copy of a post.
    I'm allowed to collect my Reddit posts, and sell e.g. printed books with them if I want. But so is Reddit.

    ... Unless a post consists of something that was already under copyright owned by someone else, say: song lyrics, an image, a video, etc.
    Reply