Friendica Social Network

mcc

4 weeks ago • •

mcc
4 weeks ago • •

Hard to imagine a signal that a website is a rugpull more intense than banning users for trying to delete their own posts

https://www.tomshardware.com/tech-industry/artificial-intelligence/stack-overflow-bans-users-en-masse-for-rebelling-against-openai-partnership-users-banned-for-deleting-answers-to-prevent-them-being-used-to-train-chatgpt

Like just incredible "burning the future to power the present" energy here

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

Stack Overflow is overflowing with salt.

^{Dallin Grimm (Tom's Hardware)}

clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛 likes this.

reshared this

in reply to mcc

So developers will stop sharing information on #StackOverflow and future #Copilot and friends will be forever stuck in the past, answering questions about historically relevant frameworks and languages.
#LLM #StuckOverflow

Lesley Carhart :unverified: reshared this.

in reply to chris@strafpla.net

mcc

in reply to chris@strafpla.net • 4 weeks ago • •

@chris Yeah. But for this to be true, we need a Stack Overflow replacement. And when Reddit went evil, the move to Lemmy doesn't seem to have succeeded as well as the move from Twitter to Mastodon.

@chris@strafpla.net

in reply to mcc

chris@strafpla.net

in reply to mcc • 4 weeks ago • •

IIRC Mastodon is older than Lemmy and the current move to Mastodon/Fedi happened in multiple waves, so it may be too early for higher expectations.
For stackoverflow I expect some degradation of quality since they accept “AI” generated content. This may additionally frustrate high quality authors and motivate them to leave. We’ll see.
What would a federated stack overflow look like if we were to invent it?

in reply to chris@strafpla.net

mcc

in reply to chris@strafpla.net • 4 weeks ago • •

@chris I don't know. It's an interesting question because Stack Overflow is inherently more search-focused than Lemmy or Mastodon.

A good model for a distributed/ownerless SO might wind up looking more like bluesky than mastodon.

@chris@strafpla.net

in reply to mcc

mcc

in reply to mcc • 4 weeks ago • •

@chris And, of course, there's the weird element that the SO license *already* does not permit AI on a facial reading, and a distributed SO would probably be *easier* to scrape than the centralized one. So you're not actually preventing AI exploitation, you're only punishing one corporation (SO) for the AI bait-and-switch.

@chris@strafpla.net

in reply to mcc

chris@strafpla.net

in reply to mcc • 4 weeks ago • •

I personally see less problem in scraping a federated pool of knowledge but I absolutely hate that stackoverflow now owns this knowledge and can keep people from using it but sell “AI” as a service to them.

in reply to chris@strafpla.net

mcc

in reply to chris@strafpla.net • 4 weeks ago • •

@chris I suppose one thing to consider is if a federated pool of knowledge is CC-BY-SA, then we only need a court ruling that OpenAI violates CC-BY-SA and the federated pool becomes AI-safe. Whereas SO can, (or already has) change the TOS so they own rights to relicense all content.

…but of course, CC-BY-SA is also incredibly inconvenient for a SO clone because everyone will generally want to copypaste sample code!

@chris@strafpla.net

in reply to mcc

chris@strafpla.net

in reply to mcc • 4 weeks ago • •

So we’d be looking for Schrödingers license, allowing and forbidding closed derivative works at the same time

(I have a feeling that a lot of licenses only work because nobody has a close look at how their objects are used.)

in reply to chris@strafpla.net

mcc

in reply to chris@strafpla.net • 4 weeks ago • •

@chris If I were actually trying to create a stackoverflow clone, I'd have the default license be something like "all code blocks are CC0 but all human text outside the code blocks is CC-BY-SA". That would I think match the unspoken expectations both contributors and readers have.

@chris@strafpla.net

chris@strafpla.net reshared this.

in reply to mcc

chris@strafpla.net

in reply to mcc • 4 weeks ago • •

That seems like a good and very straight forward approach, it’s would at least meet my expectations exactly.

This entry was edited (4 weeks ago)

in reply to mcc

mcc

in reply to mcc • 4 weeks ago • •

@chris I *am* worried about the effect "AI" scraping is gonna have on copyleft in general, tho. I think people have for many years released copyleft on the rule of "hey, why not" and now the answer is "bc AI". (More thoughts: https://mastodon.social/@mcc/112209121196262534 ) Like, my proposed license in the last post would be very AI-friendly.

mcc (@mcc@mastodon.social)

I'm really concerned about the effect "generative AI" is going to have on the attempt to build a copyleft/commons. As artists/coders, we saw that copyright constrains us.

^Mastodon

@chris@strafpla.net

in reply to mcc

argv minus one

in reply to mcc • 4 weeks ago • •

Most open source licenses, including permissive ones, require attribution. “AI” does not and cannot do attribution, so the vast majority of open source licenses are already AI-safe.

Of course, “AI” companies are already getting away scot-free with blatantly violating those licenses, so “safe” isn't really the correct word…

@chris

@chris@strafpla.net

in reply to argv minus one

chris@strafpla.net

in reply to argv minus one • 4 weeks ago • •

@argv_minus_one Until today we considered it sufficient, if a derivative work attributed all sources in one place. A collage of images or an application would come with a file or metadata transporting the necessary attributions and licenses.
Changing this would do damage.

I feel as if we’re aiming at our collective foot because we discovered a black spot.

@argv minus one

in reply to chris@strafpla.net

mcc

in reply to chris@strafpla.net • 4 weeks ago • •

@chris @argv_minus_one I have never once seen an LLM style "AI" product which even attempts to comply with attribution licenses.

This said, if the current subject of discussion is Stack Overflow, the content license on Stack Overflow is CC-BY-SA which is substantially stricter than an attribution license.

@chris@strafpla.net @argv minus one

in reply to mcc

chris@strafpla.net

in reply to mcc • 4 weeks ago • •

@argv_minus_one Oh, sorry if I wasn’t clear. I don’t mean to suggest that the current batch tries to comply, I just wrote that being compliant to the attribution clause would be possible easily.

We have different opinions on ML. I consider it okay for Altman et al. - and for the public! - to train the emperors new “#AI” on CC material if they follow the rules for derivative works (and the rest of the rules).
And I think that #AISafe is a dangerous can of worms.

#ai #aisafe @argv minus one

in reply to chris@strafpla.net

Szymon Nowicki

in reply to chris@strafpla.net • 3 weeks ago • •

@chris @mcc SO publishes database dumps so we could all make a fork and start from there with something more libre

@chris@strafpla.net

in reply to Szymon Nowicki

chris@strafpla.net

in reply to Szymon Nowicki • 3 weeks ago • •

@hey Good idea!
I was wondering if they still did and I expected, that they already stopped doing this.
I had this tool that indexed local copies of SO for referencing but I keep forgetting to reinstall it and update the database.
Thanks for reminding me!

@Szymon Nowicki

in reply to chris@strafpla.net

Szymon Nowicki

in reply to chris@strafpla.net • 3 weeks ago • •

@chris they still do (https://archive.org/details/stackexchange) and still out of their own infrastructure.

IIRC they made Stack Exchange as a response of entshittication of another Q&A service and when they designed it they made a promise to make the content on open license and publicly available so once they go evil people can move on somewhere else taking the content with them.

Which I guess might be heading into this direction.

Stack Exchange Data Dump : Stack Exchange, Inc. : Free Download, Borrow, and Streaming : Internet Archive

This is an anonymized dump of all user-contributed content on the Stack Exchange network. Each site is formatted as a separate archive consisting of XML files...

^{Internet Archive}

@chris@strafpla.net

in reply to mcc

Billy Smith

in reply to mcc • 4 weeks ago • •

@chris

Or the move from SlashDot to SoylentNews.

Simplest way: If you see a service that has hints of this, warn your friends, and, get a large bucket of popcorn.

@chris@strafpla.net

in reply to Billy Smith

mcc

in reply to Billy Smith • 4 weeks ago • •

@BillySmith @chris Don't look at me. I was part of the exodus from SlashDot to Kuro5hin. Which I thought actually went pretty well actually

@chris@strafpla.net @Billy Smith

clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛 likes this.

in reply to mcc

chris@strafpla.net

in reply to mcc • 4 weeks ago • •

@BillySmith Good times. Though the husk of slashdot is still around but Kuro5hin is not

@Billy Smith

clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛 likes this.

in reply to chris@strafpla.net

mcc

in reply to chris@strafpla.net • 4 weeks ago • •

@chris @BillySmith Yes, which is real unfortunate because some of my best writing is now offline !!! :(

@chris@strafpla.net @Billy Smith

clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛 likes this.

in reply to mcc

chris@strafpla.net

in reply to mcc • 4 weeks ago • •

This is really sad, I’m sorry.
It shows again that information can only persist if it is copied and spread.
That’s why publishing on corporate platform, exclusively is such a bad idea. Just imagine youtube would really successfully lock ‘their’ content away one day.

This entry was edited (4 weeks ago)

in reply to chris@strafpla.net

Billy Smith

in reply to chris@strafpla.net • 4 weeks ago • •

@chris
Louis Rossman is working on software to get around this. :D

https://en.wikipedia.org/wiki/Louis_Rossmann

https://www.youtube.com/watch?v=dqTYg6vnQvw :D

Youtube's Legal Team sent me a letter! 😃

https://www.youtube.com/watch?v=KJ42f-tV_3whttps://www.youtube.com/watch?v=BlX0bSp1_1ohttps://www.youtube.com/watch?v=R-QtwGfILTohttps://www.youtube.com/watc...

^YouTube

@chris@strafpla.net

in reply to Billy Smith

chris@strafpla.net

in reply to Billy Smith • 4 weeks ago • •

@BillySmith On top of storing a lot of text over the years I’ve been downloading most videos I consider exceptional for a while. I tried to extend this to everything that I read/watched using tools like https://archivebox.io but this still a crutch because it will only be archived for myself. To me something like peertube looks very interesting as a concept.

ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more…

^ArchiveBox

@Billy Smith

in reply to chris@strafpla.net

Billy Smith

in reply to chris@strafpla.net • 4 weeks ago • •

@chris
I've done the same.

When i looked at the streaming approach, i could see the future enshittification.

Peertube is great. :D

Another approach can be found here:

https://www.kickstarter.com/projects/mirlo/mirlo

and

https://mirlo.space/

Mirlo

^mirlo.space

@chris@strafpla.net

in reply to mcc

Billy Smith

in reply to mcc • 4 weeks ago • •

@chris

I lost access to my SD account back in '99, but couldn't be bothered to find it again.

It was interesting to watch, but the hints of the bust-out were always there.

@chris@strafpla.net

in reply to Billy Smith

chris@strafpla.net

in reply to Billy Smith • 4 weeks ago • •

@BillySmith To me it’s interesting that something that was so interesting to me 25 years ago completely vanished from my perception today. I may remember slashdot about 4 times a year or less. To be fair I may think of kuro5hin about 5 times, but only because I mention “Metamorphosis of prime intellect” to someone, a #SciFi story that was published there.

#scifi @Billy Smith

in reply to mcc

datarama

in reply to mcc • 4 weeks ago • •

@chris I've said it before and I'm sorry if I sound like a broken record:

Then they'll just scrape from the Stack Overflow replacement. Any creative works any human ever puts on the internet again is just training data now. There is no way we can share code with each other anymore *without* also giving it as a free gift to Sam fucking Altman and his ilk.

@chris@strafpla.net

in reply to datarama

mcc

in reply to datarama • 4 weeks ago • •

@datarama @chris If it is really the case I cannot prevent Altman from creating derivative works of anything I make, then I at least want to create the maximum possible financial consequences for any company which intentionally helps him. Stack Overflow may not have been able to prevent Altman from scraping their site. But they didn't have to accept his money.

@chris@strafpla.net @datarama

in reply to mcc

datarama

in reply to mcc • 4 weeks ago • •

@chris No, they didn't, and they're assholes for doing so. People *should* be leaving that moral dumpster fire of a site behind.

I just can't see how we can build an alternative without AI barons just using that as a pool of free labour instead. Licenses and copyright only apply to people like you and me now, not to them (as you've also pointed out).

@chris@strafpla.net

in reply to datarama

Irenes (many)

in reply to datarama • 4 weeks ago • •

yeah we share mcc's concerns about what this means for the commons. we refuse to give up on that, but it's going to be hard.

in reply to Irenes (many)

datarama

in reply to Irenes (many) • 4 weeks ago • •

@ireneista @chris My point is that I can't see how it's even *possible* to maintain an open internet commons anymore, because a robot strip-mine is not the same as a commons.

I hope I'm wrong! But I can't see how. Anything that is openly available is free training data, and we can say "please don't use this as training data" all we want; they don't care (and they don't have to).

@chris@strafpla.net @Irenes (many)

in reply to datarama

Irenes (many)

in reply to datarama • 4 weeks ago • •

well, you're describing a constraint. the engineering mindset does say to start by doing that.

in reply to Irenes (many)

Irenes (many)

in reply to Irenes (many) • 4 weeks ago • •

we don't know the solution yet either, but then we still don't feel like we have a sufficiently precise formulation of the goal, so... there is certainly stuff to think about

in reply to Irenes (many)

chris@strafpla.net

in reply to Irenes (many) • 4 weeks ago • •

@ireneista @datarama This. What is the precise thing we are aiming at - and has it the form of a foot?

(and then: who is the multitude of “we” and how many different things can be called “a thing”?)

@datarama @Irenes (many)

in reply to chris@strafpla.net

Irenes (many)

in reply to chris@strafpla.net • 4 weeks ago • •

to the last one, please read https://pluralpride.com/playbook

Plurality Playbook

^{pluralpride.com}

in reply to Irenes (many)

chris@strafpla.net

in reply to Irenes (many) • 4 weeks ago • •

@ireneista @datarama I meant to address that the people discussing about #AISafe may consider their group as a homogenous “we” of known size but I have doubts about this.
Non the less a very interesting link, thank you!

#aisafe @datarama @Irenes (many)

in reply to Irenes (many)

datarama

in reply to Irenes (many) • 4 weeks ago • •

@ireneista @chris In the EU, ML scrapers are legally required to respect a "machine-readable opt-out" for copyrighted content when training commercial systems (they can ignore copyright entirely for academic research).

The only current specification for that opt-out is W3C's TDMReP (https://www.w3.org/community/reports/tdmrep/CG-FINAL-tdmrep-20240202/). But it doesn't really work for eg. free software.

TDM Reservation Protocol (TDMRep)

This specification defines a simple and practical Web protocol, capable of expressing the reservation of rights relative to text & data mining (TDM) applied to lawfully accessible Web content, and to ease the discovery of TDM licensing policies assoc…

^www.w3.org

@chris@strafpla.net @Irenes (many)

in reply to datarama

chris@strafpla.net

in reply to datarama • 4 weeks ago • •

@datarama @ireneista So if Ia public fediverse instance sets the TDM-reserved flag its content can’t be used in a study on the prevalence of nazi propaganda on the fediverse?

@datarama @Irenes (many)

in reply to datarama

datarama

in reply to datarama • 4 weeks ago • •

@ireneista @chris Say I self-host some code I've written and set up my site's TDMReP to tell scrapers that I have opted that code out.

Now someone copies my code and puts it on Github. And now it is no longer opted out. Someone else (including employees at the companies I'd want to opt out from) can unilaterally void my opt-out, simply by copying my things elsewhere.

And free software licenses by their nature permit them to!

@chris@strafpla.net @Irenes (many)

in reply to datarama

datarama

in reply to datarama • 4 weeks ago • •

@ireneista @chris The only way around this would be by putting my opt-out in the license. But 1) that's not "machine-readable", and 2) if my license says you can't copy my code and put it elsewhere, it's not exactly friendly to the commons (or free software in any sense).

And this is about EU, which actually *has* a restriction on this. Other jurisdictions don't even have that.

To me, it looks like anyone who isn't an AI executive is fucked.

3/3

@chris@strafpla.net @Irenes (many)

in reply to datarama

chris@strafpla.net

in reply to datarama • 4 weeks ago • •

@datarama @ireneista Only fucked if your goal is to avoid machine learning on CC works completely, which IMO is pointless / missing the point. You can’t allow and disallow derivative works at the same time and you even have no idea what kind of “derivative works” people will come up in the future. But derivative works from public works should be public and not be sold as closed source.
(And then there’s the questions AGPL tries to answer.)

@datarama @Irenes (many)

in reply to chris@strafpla.net

caitp

in reply to chris@strafpla.net • 4 weeks ago • •

@chris don't worry, they'll probably just stick bots in every matrix/gitter/slack/discord/zulip they can find and train models on that instead

@chris@strafpla.net

in reply to caitp

likely not a disguised martian

in reply to caitp • 4 weeks ago • •

@caitp @chris "so, why exactly do I have to wear a fursuit to fix my issues with systemd?"

@chris@strafpla.net @caitp

This entry was edited (4 weeks ago)

in reply to likely not a disguised martian

chris@strafpla.net

in reply to likely not a disguised martian • 4 weeks ago • •

@kyonshi @caitp When I made this years resolution to really embrace systemd for a year or two I didn’t know about this perk!
Tangentially: Wherever part of a system you are in, systemd pops up. Even in this tread.

@likely not a disguised martian @caitp

in reply to chris@strafpla.net

Robbert

in reply to chris@strafpla.net • 4 weeks ago • •

@chris
So this would be the perfect time to start answering questions with subtle bugs in it, and just wait a while until your code is replicated in all kind of custom projects

@chris@strafpla.net

in reply to Robbert

chris@strafpla.net

in reply to Robbert • 4 weeks ago • •

@mjrider You’re right, the bugs in current AI generated content are too obvious to really spread

@Robbert

in reply to chris@strafpla.net

Import Antigravity

in reply to chris@strafpla.net • 4 weeks ago • •

@chris TBH SO has felt a little stuck in the past even before this. Seems like the answers I find are quite old and don't accurately reflect the state of the art. I find many answers that use deprecated features of APIs, frameworks, etc.

@chris@strafpla.net

in reply to mcc

4am ❧

in reply to mcc • 4 weeks ago • •

an article went around recently about "rewilding" the Internet that made the analogy to clear cutting an old growth forest. You get incredible wood, but you can only do it once.

in reply to mcc

mcc

in reply to mcc • 4 weeks ago • •

Earlier today I edited my (small) set of Stack Overflow posts to add the sentence "I do not consent to my words being used to train OpenAI" to the end. Within hours, all these edits were reversed and I got a warning email for "removing or defacing content". I did not remove any content. If this small sentence is "defacing", it is a very minor defacement. In no way was the experience of other users made worse by me adding one sentence.

To Stack Overflow, you are not a person. You are "content".

Hello,

We're writing in reference to your Stack Overflow account:

https://stackoverflow.com/users/6582253/mcc

You have recently removed or defaced content from your posts. Please note that once you post a question or answer to this site, those posts become part of the collective efforts of others who have also contributed to that content. Posts that are potentially useful to others should not be removed except under extraordinary circumstances. Even if the post is no longer useful to the original author, that information is still beneficial to others who may run into similar problems in the future - this is the underlying philosophy of Stack Exchange.

reshared this

in reply to mcc

mcc

in reply to mcc • 4 weeks ago • •

Not only does Stack Overflow say you don't have a right to remove your words from Stack Overflow, according to Stack Overflow, you don't even have the right to decide what words Stack Overflow publishes under your name.

in reply to mcc

Tek is not a criminal

in reply to mcc • 4 weeks ago • •

Stack Overflow is subject to the CCPA privacy law, just sayin’.

in reply to mcc

Sashin

in reply to mcc • 4 weeks ago • •

this email makes me so pissed off, it's a for profit fucking enterprise, that content which posted for free leads to the profits of the shareholders, it's unpaid labour!!!!!!

in reply to Sashin

clacke: inhibited exhausted pixie dream boy 🇸🇪🇭🇰💙💛

in reply to Sashin • 4 weeks ago • •

@Sashin @mcc User-Generated Content, baby.

It's been a goldmine for a bit longer than the term "Web 2.0" has been around, but until recently we have been taking it as a social contract that we give it to the corporation and they give it to the world for some ad revenue.

That social contract is rapidly coming apart as investors see more profit potential in newly enabled modes of exploitation.

@mcc @Sashin

This entry was edited (4 weeks ago)

Unknown parent

chris@strafpla.net

Unknown parent • 4 weeks ago • •

@deflockcom @hacks4pancakes I don’t know about a #searchEngine that unlists “#AI” / #LLM generated content, but this thread may be tangentially interesting to you:

https://mstdn.strafpla.net/@chris/112039450597316623

chris@strafpla.net (@chris@strafpla.net)

@Dianepatterson@wandering.shop Thanks, never thought of it! Besides from all the other interesting search engines I’m reading about in the replies, I’d mention https://teclis.

^Strafplanet

#ai #llm #searchengine @Lesley Carhart :unverified:

Unknown parent

chris@strafpla.net

Unknown parent • 4 weeks ago • •

@datarama @ireneista Better: The nazis just have to set the tdm:non-research constraint and they can spread propaganda without anybody being allowed to analyze it. Nifty!

@datarama @Irenes (many)

Unknown parent

chris@strafpla.net

Unknown parent • 4 weeks ago • •

Thank you, I missed that.
(And I now have to check what “commercial” means. Let’s hope the study is not published in a book or in a newspaper.)

This entry was edited (4 weeks ago)

⇧