Is anyone looking at levels of volunteer activity in digital public goods like Wikipedia, Reddit, and stackoverflow and whether exploitation of user content is associated with a decline in human participation?
As companies use ai to convert communal public goods into AI products, while extracting value from them, we could see a general decline in contributions that could lead to ecosystem collapse. Would love to see in depth scholarship wrestling with this. Cc @mako
https://www.infoworld.com/article/3478485/the-rise-and-fall-of-stack-overflow.html/amp/
I am particularly thinking about stackoverflow and Reddit, but I wonder if this could be a more general issue
https://www.theregister.com/2024/12/09/reddit_ai_answers_search_feature/
@natematias I'm not sure putting Wikipedia in the same bag with Reddit and Stack Overflow helps. Is there a good taxonomy of public digital goods? Comparing trends between such domains would be interesting. If people used to contributing to one place lose faith in it do they stop completely or start contributing somewhere else?
@Szescstopni yes! Social scientists consider them communal public goods, where people contribute to a common resource of knowledge without compensation via indirect reciprocity, where the person who helps you is different from the person you help. Such public goods have a risk to critical mass from free riders who benefit without contributing. An effective public good serves the maximum people freely while maintaining critical mass
https://academic.oup.com/ct/article-abstract/6/1/60/4259000?redirectedFrom=fulltext
@Szescstopni public goods can collapse entirely when people lose faith in them and stop contributing, and can be replaced by nothing, or by private goods. Forces that have done this in other domains include colonization, privatization, nationalization, and ecosystem collapse
@natematias I guess there's little hope. Funny thing, I stopped contributing to Polish Wikipedia years ago, because of the extreme conservatism of the editors, but have no problems with editing English Wikipedia when I see someone is wrong there :)
@natematias @mako at least Wikipedia is a nonprofit with an open license, so there is a reasonable understanding that contributors are giving their time and effort truly for the commons. Openstreetmaps is similar. I do not do unpaid volunteer work for for-profit companies. There is a history of privatizing volunteer work and data; the first time I was conscious of this happening was with CDDB, which seems almost quaint now: https://www.eff.org/deeplinks/2021/05/outliving-outrage-public-interest-internet-cddb-story
@natematias @mako and of course for-profit social media is basically about providing your content for free to a private company that is mining your content, connections, activities and eyeballs to sell ads and track you around other sites. Meta, X, Reddit, etc. all get broad license to re-use your work when you post, according to the TOC you supposedly agreed to at one point (which is updated repeatedly and no one ever reads).
@kajord @natematias @mako "AI" isn't the commons, though. Wikipedia is made available specifically under a license and "AI" companies that scrape Wikipedia don't follow it.
Most of the intentional commons-y stuff (creative commons, wikipedia, "open source") in the 00s-10s was done under a sort of social contract where if you contributed you had guarantees about how it would be used, how you'd be credited etc. Now that social contract's broken cuz everyone knows AI corps just take and that's it.
@kajord @natematias @mako @mcc The license Wikipedia uses for pretty much everything requires that you mention where you got the content from. This is why all the assistants and search results say "According to Wikipedia, the free encyclopedia…". AI companies don't do that though, they just take the content and share no list of all the citations that they used for training.
@natematias @mako Yeah there's a bunch on this. I know I saw a few talks on this fairly recently, but here's one recent paper that is really good on StackOverflow impacts: https://academic.oup.com/pnasnexus/article/3/9/pgae400/7754871
@natematias @bwaber @mako there is also https://arxiv.org/abs/2407.14933
@natematias @mako Can't believe I forgot this one: https://www.nature.com/articles/s41598-024-61221-0
@natematias @mako As an aside Burtch does excellent work in general, here's another one by him that might be of interest: https://youtu.be/pMOjvAEpHqk?si=7zlA1rWolufaawmj&t=58
@bwaber @natematias @mako Stack Overflow is not newbies friendly. Their decline was bound to happen.
@bwaber
Interesting! Do you know whether there are similar articles that look at the impact of SO making there data available for LLM training? I remember that this led to some more senior contributors stopping their contribution and I'm wondering whether this could not be a significant confounding factor for this population. I don't remember when these changes happened, though...
@natematias @mako
@natematias @mako well I signed an NDA but I can definitely tell you that SO has posted about this publicly and cares a lot about it
@natematias @mako @andresmh lots of links in the thread I’m seeing, but also: @nickmvincent and I have been chatting about this directly in HCI spaces, stay tuned
@jts @mako @andresmh @nickmvincent oh wonderful!
@natematias I mean, the more fundamental problem with Stack Overflow and its derivatives is that other people can edit your questions and replies. I don't know how to explain that that is wrong; a willful, grotesque violation of one of the most basic rules of the internet. When I was still intending on a career as an academic mathematician, I was planning on boycotting Math Overflow, an important resource for active researchers, by never posting there for this reason. Thus, I cannot pretend that I care much that it's dying. It's the closest thing to consequences that they're ever going to get.
@natematias @mako anecdotally, i can certainly tell you that i use Stack Overflow less than i did -- not because they will scrape my questions and answers (which is ghastly enough) but because AI *answers* are becoming more and more prevalent, so people may give you a correct-sounding answer that simply doesn't work, and that they didn't understand either, because they're trying to farm karma
@natematias @mako c'est une bonne question, mais je la tournerai autrement. cet usage abusif des données communautaires, de ce "bien commun" ne va-t-il pas emmener les "bénévoles" ( en fait "citoyens" qui font de la polis-tique c'est à dire intéressent à la gestion de la cité ) à intéresser à des sujets plus cruciaux, comme par exemple la sensibilité écologique ?
A first, public, thing from @nickmvincent, myself, and Johanna Desprez: https://dataleverage.substack.com/p/tipping-points-for-content-ecosystems
Bigger thoughts coming, but wanted to follow-up!
@jts @nickmvincent this is very exciting, thanks for sharing! We have a grant that is going to support teams of researchers to do modeling on related topics, and I would love to talk further
@natematias @nickmvincent I shouldn’t speak for Nick, but I’m in, and I’m gonna guess he is too!