That’s why I’m on Lemmy. At least when they train AI on my posts here it’s not legitimized by some contract.
We did it reddit, we trained an AI to be the pure embodiment of cringe.
Dumb question for the Lemmy lawyers, if enough redditors joined could a class action lawsuit be filed to be paid for their content… Or is that so outside of the TOS that it’s not worth considering?
TOS dictates that Reddit owns all content on their platform, you’d have no case
Reddit doesn’t “own” the content, TOS only have users agree to give Reddit a license to do as it pleases.
Ah, right they don’t own it! It’s just stored on their servers, and they have exclusive rights to do whatever they’d like with it. But they don’t own it.
However It gets interesting because under EU law TOS that violate GDPR are not enforceable. So at least EU citizens could probably have some recourse.
There’s a lot of “at least EU citizens” going around lol
California has something similar too (CCPA), as do a few other non-EU countries and US states.
Americans find it odd that other people have legal protections.
And that’s why I deleted all my posts and comments before deleting my account. Sure, they could probably go back and restore it if they wanted but, so far, they haven’t.
Glad I landed here on Lemmy.
Yeah! Here, no one gets paid when someone else wants to profit off of all the free user generated content. Wait, what was our goal again?
On Lemmy all you need to do is follow every community you can find and you’ll get a stream of posts, comments, voting behaviour, edits, and even admin behaviour, all raw and unprocessed with all the metadata you could hope for without paying a penny.
I’m not saying every Lemmy server is being used to train AI models, but I’m sure the big ones are.
Presumably most of the current AI models have already had access to reddit data in the past, so I am a bit confused about why they would pay 60 million for it now.
I deleted all my comments last year. Recently I got a notification for a response in one of such comments. When I clicked the notification link, my comment and the response were visible. The comment doesn’t show up in my profile.
I’ve had the same experience. Most scripts just erase the comments available directly through your reddit profile, which is limited to the most recent ~2000 posts that you’ve made. To fully erase anything and everything, you need to request all your data from reddit, download the .zip and feed it into an application like shreddit.
Interesting. I’ve specifically searched for some fairly unique content (Python scripts, etc) I posted in my time over there, and it hasn’t shown up at all.
So you left your Reddit account intact?
Edit: Fucking. Cunts. I just searched (had been a few months) and at least some of my data is back. I reckon they’ve done it ahead of the planned AI move and IPO.
Reddit was aggressively rate limiting tools used to delete and edit content in a funny way when the API pricing was announced. The API wouldn’t return an error, the rate limiting was silent, and the tools would report successful deletion or edits even when the edit or deletion wasn’t made.
I had to modify an existing script to handle the 5-second rate limit and, lieu of deleting, I just rewrote each comment with a farewell.
Even then I did 3 passes (minor additional edits) in cases Reddit was saving previous edits.
My content has stayed edited.
Do you still have the Python script available?
I was fine with keeping my comments up before for the future searchers, but I’m not fine with that shithole making profit off of it.
I recently used shreddit with the --gdpr-export-dir flag and it worked perfectly.
Yep used ‘power delete suite’ to delete everything before I left.
I suspect Reddit holds a perfect copy of every edit, including the first, you’ve ever done. For legal reasons if nothing else. Now also to prevent against perfectly good AI training content to be deleted.
Well, I just discovered a bunch of my stuff had been restored. Says deleted account, but it’s there.
Deleting your account doesnt delete your content AFAIK.
I was saying elsewhere I deleted all my content before deleting my account, but now some of my content is back.
Just in time to make new AI generated shitposts with AI generated replies & pump up those numbers for the IPO.
Can’t wait to read a post about how a novice AI finds it hard to animate human hands and some other AI suggest studying hentai porn to get the finger/tentacles movements just right. And ofc lots of ads. From AIs, to AIs, by AIs, for AIs.
r/TotallyNotRobots is spreading everywhere.
Reddit is run by pigeons and other birds/drones confined. Actually we always knew that.
They are gonna love it when their chatbot also chooses that man’s dead wife.
There’s gonna be so many bots commenting “Actually…” Followed by the most incorrect information about the topic at hand possible.
Lol, so they’re going to be training their AI on… AI generated content? The uptick in that shit on reddit has been made it more annoying than usual.
That and all the confidently incorrect shit on the site… Not to mention the constant in-jokes. I’m just imagining a chatbot responding to something about how to deal with grief with “I also choose this man’s dead wife!”
Can’t see how this could possibly go wrong.
$60m doesn’t seem like that much in an era where twitter could (have been) sold for $40b.
60 million a year for access to the relatively public data… That seems pretty good to me tbh.
Maybe, but with people are saying reddit’s main value proposition is access to AI training data, and that reddit is worth n billion dollars, $60m seems like a pittance.
Its just an API, right?
No, it’s really not.
Firstly, while the data may be public, it’s not “free”. Scraping reddit and using it to train an AI would likely contravene their terms of use, you’d end up facing similar copyright issues that the current generation of bots has.
Secondly, scraped data would be incomplete, you wouldn’t get anything edited or “deleted”, which would surely be available if you paid them. The edits and deletes would be very valuable for AI training.
Thirdly, you would get the meta that reddit has. Geolocation, user agent, alt accounts, browsing habits, et cetera.
Fourthly, you wouldn’t get exclusivity. Locking out a competitor is worth something.
Idk why you are talking about scraping when I said API?
And is all that information in the training contract?
I assumed that when you said “it’s just an API” you were saying you’re paying $60m for an API as opposed to scraping for free.
Is all what information in the training contract?
Damn just 60 mil??
Yeah, the diarrhea of my shitposts over there alone is worth more, it’s what will make the future AI kinda smart & very depressed.
Like seriously, this must be fake. Add a zero and I’d still find it suspiciously cheap.
Funny, I don’t see anyone saying the AI companies have free right to Reddit’s content.
Can users opt out? Because the content belong to the users
my layman understanding would be, that they include it in the TOS and your only option would be to leave the platform and demand them to delete all your content, which they may or may not do. E.g. they could just train the AI on an older backup. Good luck getting your rights recognized and abided by.
The content belongs to users… they just license it to Reddit, for Reddit to do as it pleases:
It doesn’t, as soon as you post on reddit it becomes ‘content’ on their social media.
No, the user owns it, but by creating an account you provide Reddit a license to use that content in certain ways.
So, it’s yours, but you’ve agreed to let them do whatever they want with it as if it’s theirs, too.
Yes, as we left reddit, the option to delete everything and leave a memorable ‘fuck u/spez’ was always ours.
so the API thing was over nothing? brilliant
No, it was just preemptive to enforce control over who can programmatically read the site
Sounds like it’s time for me to actually log back in and delete all my old posts. I’ve been putting that off for too long.
Be sure to edit them before deletion in case it gets restored. There’s been reports of that happening.
Yeah true. Is power delete suite still the preferred method?
Trained on 99% reposts
And the outputs of bots. There has been a shocking increase in auto-generated comments on reddit in the past years and it’s turning the training data into a minefield.
Haven’t touched reddit socially in 8 months, but every now and then I’ll use it to search for opinions or instructions on things. Searched “reddit best domain registrar” recently and landed on a thread where top to bottom, every comment recommending a registrar was from a bot and/or banned account. No real person testimonials, all ads. And as AI implementations improve, that’s going to get harder to spot. In the meantime, I’m formatting searches like “best domain registrar lemmy” because reddit is legit that bad rn.
Add the bot problem to it and you’ll get garbage in, garbage out
Hell even the users didn’t exactly contribute good quality content.
That AI is going to get really racist, really fast, judging by the muck we all saw daily on Reddit.
Although it’s going to be really good at anime porn too. So there’s that.