Hard to imagine a signal that a website is a rugpull more intense than banning users for trying to delete their own posts
Like just incredible "burning the future to power the present" energy here
reshared this
Hard to imagine a signal that a website is a rugpull more intense than banning users for trying to delete their own posts
Like just incredible "burning the future to power the present" energy here
reshared this
chris@strafpla.net
in reply to mcc • • •#LLM #StuckOverflow
Lesley Carhart :unverified: reshared this.
mcc
in reply to chris@strafpla.net • • •chris@strafpla.net
in reply to mcc • • •For stackoverflow I expect some degradation of quality since they accept “AI” generated content. This may additionally frustrate high quality authors and motivate them to leave. We’ll see.
What would a federated stack overflow look like if we were to invent it?
mcc
in reply to chris@strafpla.net • • •@chris I don't know. It's an interesting question because Stack Overflow is inherently more search-focused than Lemmy or Mastodon.
A good model for a distributed/ownerless SO might wind up looking more like bluesky than mastodon.
mcc
in reply to mcc • • •chris@strafpla.net
in reply to mcc • • •mcc
in reply to chris@strafpla.net • • •@chris I suppose one thing to consider is if a federated pool of knowledge is CC-BY-SA, then we only need a court ruling that OpenAI violates CC-BY-SA and the federated pool becomes AI-safe. Whereas SO can, (or already has) change the TOS so they own rights to relicense all content.
…but of course, CC-BY-SA is also incredibly inconvenient for a SO clone because everyone will generally want to copypaste sample code!
chris@strafpla.net
in reply to mcc • • •So we’d be looking for Schrödingers license, allowing and forbidding closed derivative works at the same time
(I have a feeling that a lot of licenses only work because nobody has a close look at how their objects are used.)
mcc
in reply to chris@strafpla.net • • •chris@strafpla.net reshared this.
chris@strafpla.net
in reply to mcc • • •mcc
in reply to mcc • • •mcc (@mcc@mastodon.social)
Mastodonargv minus one
in reply to mcc • • •Most open source licenses, including permissive ones, require attribution. “AI” does not and cannot do attribution, so the vast majority of open source licenses are already AI-safe.
Of course, “AI” companies are already getting away scot-free with blatantly violating those licenses, so “safe” isn't really the correct word…
@chris
chris@strafpla.net
in reply to argv minus one • • •@argv_minus_one Until today we considered it sufficient, if a derivative work attributed all sources in one place. A collage of images or an application would come with a file or metadata transporting the necessary attributions and licenses.
Changing this would do damage.
I feel as if we’re aiming at our collective foot because we discovered a black spot.
mcc
in reply to chris@strafpla.net • • •@chris @argv_minus_one I have never once seen an LLM style "AI" product which even attempts to comply with attribution licenses.
This said, if the current subject of discussion is Stack Overflow, the content license on Stack Overflow is CC-BY-SA which is substantially stricter than an attribution license.
chris@strafpla.net
in reply to mcc • • •@argv_minus_one Oh, sorry if I wasn’t clear. I don’t mean to suggest that the current batch tries to comply, I just wrote that being compliant to the attribution clause would be possible easily.
We have different opinions on ML. I consider it okay for Altman et al. - and for the public! - to train the emperors new “#AI” on CC material if they follow the rules for derivative works (and the rest of the rules).
And I think that #AISafe is a dangerous can of worms.
Szymon Nowicki
in reply to chris@strafpla.net • • •chris@strafpla.net
in reply to Szymon Nowicki • • •I was wondering if they still did and I expected, that they already stopped doing this.
I had this tool that indexed local copies of SO for referencing but I keep forgetting to reinstall it and update the database.
Thanks for reminding me!
Szymon Nowicki
in reply to chris@strafpla.net • • •@chris they still do (https://archive.org/details/stackexchange) and still out of their own infrastructure.
IIRC they made Stack Exchange as a response of entshittication of another Q&A service and when they designed it they made a promise to make the content on open license and publicly available so once they go evil people can move on somewhere else taking the content with them.
Which I guess might be heading into this direction.
Stack Exchange Data Dump : Stack Exchange, Inc. : Free Download, Borrow, and Streaming : Internet Archive
Internet ArchiveBilly Smith
in reply to mcc • • •@chris
Or the move from SlashDot to SoylentNews.
Simplest way: If you see a service that has hints of this, warn your friends, and, get a large bucket of popcorn.
mcc
in reply to Billy Smith • • •clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛 likes this.
chris@strafpla.net
in reply to mcc • • •clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛 likes this.
mcc
in reply to chris@strafpla.net • • •clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛 likes this.
chris@strafpla.net
in reply to mcc • • •It shows again that information can only persist if it is copied and spread.
That’s why publishing on corporate platform, exclusively is such a bad idea. Just imagine youtube would really successfully lock ‘their’ content away one day.
Billy Smith
in reply to chris@strafpla.net • • •@chris
Louis Rossman is working on software to get around this. :D
https://en.wikipedia.org/wiki/Louis_Rossmann
https://www.youtube.com/watch?v=dqTYg6vnQvw :D
Youtube's Legal Team sent me a letter! 😃
YouTubechris@strafpla.net
in reply to Billy Smith • • •ArchiveBox
ArchiveBoxBilly Smith
in reply to chris@strafpla.net • • •@chris
I've done the same.
When i looked at the streaming approach, i could see the future enshittification.
Peertube is great. :D
Another approach can be found here:
https://www.kickstarter.com/projects/mirlo/mirlo
and
https://mirlo.space/
Mirlo
mirlo.spaceBilly Smith
in reply to mcc • • •@chris
I lost access to my SD account back in '99, but couldn't be bothered to find it again.
It was interesting to watch, but the hints of the bust-out were always there.
chris@strafpla.net
in reply to Billy Smith • • •datarama
in reply to mcc • • •@chris I've said it before and I'm sorry if I sound like a broken record:
Then they'll just scrape from the Stack Overflow replacement. Any creative works any human ever puts on the internet again is just training data now. There is no way we can share code with each other anymore *without* also giving it as a free gift to Sam fucking Altman and his ilk.
mcc
in reply to datarama • • •datarama
in reply to mcc • • •@chris No, they didn't, and they're assholes for doing so. People *should* be leaving that moral dumpster fire of a site behind.
I just can't see how we can build an alternative without AI barons just using that as a pool of free labour instead. Licenses and copyright only apply to people like you and me now, not to them (as you've also pointed out).
Irenes (many)
in reply to datarama • • •datarama
in reply to Irenes (many) • • •@ireneista @chris My point is that I can't see how it's even *possible* to maintain an open internet commons anymore, because a robot strip-mine is not the same as a commons.
I hope I'm wrong! But I can't see how. Anything that is openly available is free training data, and we can say "please don't use this as training data" all we want; they don't care (and they don't have to).
Irenes (many)
in reply to datarama • • •Irenes (many)
in reply to Irenes (many) • • •chris@strafpla.net
in reply to Irenes (many) • • •@ireneista @datarama This. What is the precise thing we are aiming at - and has it the form of a foot?
(and then: who is the multitude of “we” and how many different things can be called “a thing”?)
Irenes (many)
in reply to chris@strafpla.net • • •Plurality Playbook
pluralpride.comchris@strafpla.net
in reply to Irenes (many) • • •Non the less a very interesting link, thank you!
datarama
in reply to Irenes (many) • • •@ireneista @chris In the EU, ML scrapers are legally required to respect a "machine-readable opt-out" for copyrighted content when training commercial systems (they can ignore copyright entirely for academic research).
The only current specification for that opt-out is W3C's TDMReP (https://www.w3.org/community/reports/tdmrep/CG-FINAL-tdmrep-20240202/). But it doesn't really work for eg. free software.
1/
TDM Reservation Protocol (TDMRep)
www.w3.orgchris@strafpla.net
in reply to datarama • • •datarama
in reply to datarama • • •@ireneista @chris Say I self-host some code I've written and set up my site's TDMReP to tell scrapers that I have opted that code out.
Now someone copies my code and puts it on Github. And now it is no longer opted out. Someone else (including employees at the companies I'd want to opt out from) can unilaterally void my opt-out, simply by copying my things elsewhere.
And free software licenses by their nature permit them to!
2/
datarama
in reply to datarama • • •@ireneista @chris The only way around this would be by putting my opt-out in the license. But 1) that's not "machine-readable", and 2) if my license says you can't copy my code and put it elsewhere, it's not exactly friendly to the commons (or free software in any sense).
And this is about EU, which actually *has* a restriction on this. Other jurisdictions don't even have that.
To me, it looks like anyone who isn't an AI executive is fucked.
3/3
chris@strafpla.net
in reply to datarama • • •(And then there’s the questions AGPL tries to answer.)
caitp
in reply to chris@strafpla.net • • •likely not a disguised martian
in reply to caitp • • •chris@strafpla.net
in reply to likely not a disguised martian • • •Tangentially: Wherever part of a system you are in, systemd pops up. Even in this tread.
Robbert
in reply to chris@strafpla.net • • •So this would be the perfect time to start answering questions with subtle bugs in it, and just wait a while until your code is replicated in all kind of custom projects
chris@strafpla.net
in reply to Robbert • • •Import Antigravity
in reply to chris@strafpla.net • • •4am ❧
in reply to mcc • • •mcc
in reply to mcc • • •Earlier today I edited my (small) set of Stack Overflow posts to add the sentence "I do not consent to my words being used to train OpenAI" to the end. Within hours, all these edits were reversed and I got a warning email for "removing or defacing content". I did not remove any content. If this small sentence is "defacing", it is a very minor defacement. In no way was the experience of other users made worse by me adding one sentence.
To Stack Overflow, you are not a person. You are "content".
reshared this
Jcrabapple and Aral Balkan reshared this.
mcc
in reply to mcc • • •Tek is not a criminal
in reply to mcc • • •Sashin
in reply to mcc • • •clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛
in reply to Sashin • • •@Sashin @mcc User-Generated Content, baby.
It's been a goldmine for a bit longer than the term "Web 2.0" has been around, but until recently we have been taking it as a social contract that we give it to the corporation and they give it to the world for some ad revenue.
That social contract is rapidly coming apart as investors see more profit potential in newly enabled modes of exploitation.
chris@strafpla.net
Unknown parent • • •@deflockcom @hacks4pancakes I don’t know about a #searchEngine that unlists “#AI” / #LLM generated content, but this thread may be tangentially interesting to you:
https://mstdn.strafpla.net/@chris/112039450597316623
chris@strafpla.net (@chris@strafpla.net)
Strafplanetchris@strafpla.net
Unknown parent • • •chris@strafpla.net
Unknown parent • • •(And I now have to check what “commercial” means. Let’s hope the study is not published in a book or in a newspaper.)