social.coop is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Fediverse instance for people interested in cooperative and collective projects. If you are interested in joining our community, please apply at https://join.social.coop/registration-form.html.

Administered by:

Server stats:

487
active users

J. Nathan Matias 🦣

Is anyone looking at levels of volunteer activity in digital public goods like Wikipedia, Reddit, and stackoverflow and whether exploitation of user content is associated with a decline in human participation?

As companies use ai to convert communal public goods into AI products, while extracting value from them, we could see a general decline in contributions that could lead to ecosystem collapse. Would love to see in depth scholarship wrestling with this. Cc @mako

infoworld.com/article/3478485/

InfoWorldThe rise and fall of Stack OverflowBig question marks hang over the programming and software development website with all the answers.

@natematias I'm not sure putting Wikipedia in the same bag with Reddit and Stack Overflow helps. Is there a good taxonomy of public digital goods? Comparing trends between such domains would be interesting. If people used to contributing to one place lose faith in it do they stop completely or start contributing somewhere else?

@Szescstopni yes! Social scientists consider them communal public goods, where people contribute to a common resource of knowledge without compensation via indirect reciprocity, where the person who helps you is different from the person you help. Such public goods have a risk to critical mass from free riders who benefit without contributing. An effective public good serves the maximum people freely while maintaining critical mass

academic.oup.com/ct/article-ab

@Szescstopni public goods can collapse entirely when people lose faith in them and stop contributing, and can be replaced by nothing, or by private goods. Forces that have done this in other domains include colonization, privatization, nationalization, and ecosystem collapse

@natematias I guess there's little hope. Funny thing, I stopped contributing to Polish Wikipedia years ago, because of the extreme conservatism of the editors, but have no problems with editing English Wikipedia when I see someone is wrong there :)

@natematias @mako at least Wikipedia is a nonprofit with an open license, so there is a reasonable understanding that contributors are giving their time and effort truly for the commons. Openstreetmaps is similar. I do not do unpaid volunteer work for for-profit companies. There is a history of privatizing volunteer work and data; the first time I was conscious of this happening was with CDDB, which seems almost quaint now: eff.org/deeplinks/2021/05/outl

Electronic Frontier Foundation · Outliving Outrage on the Public Interest Internet: the CDDB StoryThis blog post is part of a series, looking at the public interest internet—the parts of the internet that don’t garner the headlines of Facebook or Google, but quietly provide public goods and useful services without requiring the scale or the business practices of the tech giants. Read our...

@natematias @mako and of course for-profit social media is basically about providing your content for free to a private company that is mining your content, connections, activities and eyeballs to sell ads and track you around other sites. Meta, X, Reddit, etc. all get broad license to re-use your work when you post, according to the TOC you supposedly agreed to at one point (which is updated repeatedly and no one ever reads).

@kajord @natematias @mako "AI" isn't the commons, though. Wikipedia is made available specifically under a license and "AI" companies that scrape Wikipedia don't follow it.

Most of the intentional commons-y stuff (creative commons, wikipedia, "open source") in the 00s-10s was done under a sort of social contract where if you contributed you had guarantees about how it would be used, how you'd be credited etc. Now that social contract's broken cuz everyone knows AI corps just take and that's it.

@kajord @natematias @mako @mcc The license Wikipedia uses for pretty much everything requires that you mention where you got the content from. This is why all the assistants and search results say "According to Wikipedia, the free encyclopedia…". AI companies don't do that though, they just take the content and share no list of all the citations that they used for training.

@bwaber @mako oh wow, this is super helpful, thanks Ben!

arXiv logo
arXiv.orgConsent in Crisis: The Rapid Decline of the AI Data CommonsGeneral-purpose artificial intelligence (AI) systems are built on massive swathes of public web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge, we conduct the first, large-scale, longitudinal audit of the consent protocols for the web domains underlying AI training corpora. Our audit of 14,000 web domains provides an expansive view of crawlable web data and how codified data use preferences are changing over time. We observe a proliferation of AI-specific clauses to limit use, acute differences in restrictions on AI developers, as well as general inconsistencies between websites' expressed intentions in their Terms of Service and their robots.txt. We diagnose these as symptoms of ineffective web protocols, not designed to cope with the widespread re-purposing of the internet for AI. Our longitudinal analyses show that in a single year (2023-2024) there has been a rapid crescendo of data restrictions from web sources, rendering ~5%+ of all tokens in C4, or 28%+ of the most actively maintained, critical sources in C4, fully restricted from use. For Terms of Service crawling restrictions, a full 45% of C4 is now restricted. If respected or enforced, these restrictions are rapidly biasing the diversity, freshness, and scaling laws for general-purpose AI systems. We hope to illustrate the emerging crises in data consent, for both developers and creators. The foreclosure of much of the open web will impact not only commercial AI, but also non-commercial AI and academic research.

@bwaber @natematias @mako Stack Overflow is not newbies friendly. Their decline was bound to happen.

@bwaber
Interesting! Do you know whether there are similar articles that look at the impact of SO making there data available for LLM training? I remember that this led to some more senior contributors stopping their contribution and I'm wondering whether this could not be a significant confounding factor for this population. I don't remember when these changes happened, though...
@natematias @mako

@natematias @mako well I signed an NDA but I can definitely tell you that SO has posted about this publicly and cares a lot about it

@natematias @mako @andresmh lots of links in the thread I’m seeing, but also: @nickmvincent and I have been chatting about this directly in HCI spaces, stay tuned

@natematias I mean, the more fundamental problem with Stack Overflow and its derivatives is that other people can edit your questions and replies. I don't know how to explain that that is wrong; a willful, grotesque violation of one of the most basic rules of the internet. When I was still intending on a career as an academic mathematician, I was planning on boycotting Math Overflow, an important resource for active researchers, by never posting there for this reason. Thus, I cannot pretend that I care much that it's dying. It's the closest thing to consequences that they're ever going to get.

@natematias @mako anecdotally, i can certainly tell you that i use Stack Overflow less than i did -- not because they will scrape my questions and answers (which is ghastly enough) but because AI *answers* are becoming more and more prevalent, so people may give you a correct-sounding answer that simply doesn't work, and that they didn't understand either, because they're trying to farm karma

@natematias @mako c'est une bonne question, mais je la tournerai autrement. cet usage abusif des données communautaires, de ce "bien commun" ne va-t-il pas emmener les "bénévoles" ( en fait "citoyens" qui font de la polis-tique c'est à dire intéressent à la gestion de la cité ) à intéresser à des sujets plus cruciaux, comme par exemple la sensibilité écologique ?

@jts @nickmvincent this is very exciting, thanks for sharing! We have a grant that is going to support teams of researchers to do modeling on related topics, and I would love to talk further

@natematias @nickmvincent I shouldn’t speak for Nick, but I’m in, and I’m gonna guess he is too!