social.coop is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Fediverse instance for people interested in cooperative and collective projects. If you are interested in joining our community, please apply at https://join.social.coop/registration-form.html.

Administered by:

Server stats:

489
active users

GenAI bots are pushing websites into a corner that imperils open access, and perhaps worse, the web's historical record. From @gluejar:

go-to-hellman.blogspot.com/202

Assuming that the web will continue to evolve instead of getting crushed underfoot, there is some interesting work going on over at the IETF about how to build on the now aged robots.txt protocol to allow rights holders to express how their content can be used online:

ietf.org/blog/aipref-wg/

go-to-hellman.blogspot.comAI bots are destroying Open AccessThere's a war going on on the Internet. AI companies with billions to burn are hard at work destroying the websites of libraries, archives, ...

@edsu

My experience with @base and other web services run by Bielefeld University Library is in line with @gluejar's.

The IETF sound naive when they claim that “[r]ight now, AI vendors use a confusing array of non-standard signals in the robots.txt file (defined by RFC 9309) and elsewhere to guide their crawling and training decisions” when in reality many of them ignore whatever signals a website sends them. They even plunder the shadow libraries.

Ed Summers

@chpietsch yes, I guess you could look at it as naive. In many ways robots.txt was naive too. But one aspect to this is that we need ways for rights holders to assert their wishes, so that courts in jurisdictions that care (e.g. the EU) can use them as evidence. And there needs to be more nuance than what robots.txt provides:

mailarchive.ietf.org/arch/msg/

mailarchive.ietf.org[ai-control] Who gets to make claims?Search IETF mail list archives

@edsu @chpietsch Was it naïve or a brilliant way to avoid regulation? Remember “do not track?” Ditto. I think naïve is thinking that the assholes in Big Tech don’t know exactly what they’re doing when they seek to avoid accountability. But hey, at this point, they have the world’s most lethal military behind them so I guess accountability is moot.

@edsu@social.coop @chpietsch@fedifreu.de to be real I would prefer if we DIDNT empower massive corporations to more strictly restrict content via copyright and IP law directly, this IS going to be abused to lock down information from regular people more than it helps them

What is truly naive is thinking massive AI scraping operations which already break several laws are going to suddenly start following the rules if you make the rules good enough, in reality this is going to further lock down the internet while these massive scraping operations change nothing, keep doing crimes, and then claim everything is "fair use" in the courts and no rights holder can complain anyway

seems really misguided and bad to legally strengthen a system that heavily restricts information for a group of AI scrapers who will happily commit crimes and grab more than they're allowed regardless, in the AI world more data than your competitor literally means greater profits, and until that shifts this proposal will only harm regular people imo