Worried about the dominance of big instances? No, really, this is quite natural.
As an emergent and self-governing system, it could be expected that the size distribution of #Mastodon instances roughly follows Zipf's law.
At first you see the top 6 instances, and then the rest. But on a log-log scale the size distribution is close to a straight line, which would be expected from an emergent system.
Oh, one bonus toot. Based on the top 200 instances, the s factor of Zipf's law on the fediverse is approximately 1.3. If all the instances were taken into account, the factor could change. But I didn't find a quick way to grab the table other than manually, so I only used the top 200 instances.
Just four days ago I pulled this data about Mastodon instance user counts.
Since then the user count of a typical instance has grown about 10%.
Ten percent more people here. In just four days.
We'll see if this wave flattens soon, or if this is just the beginning.
But no wonder it's felt quite wild here this week.
@Stoori This JSON contains data of 1762 instances: https://instances.social/list.json?q%5Busers%5D=&strict=false
@mayel Thank you! Now if I only knew an easy way to extract just the user counts from that... :D
@mayel more like this:
cat list.json | tr ',' '\n' | grep users | tr -d '"users":' | sort -nr > usercount.txt
Seems to do what I want.
@mayel Ok, now that I got the data for 1746 instances, I can say that the s factor is about 1.33777, so pretty close to the original approximation of 1.3.
One and a quarter million.
@Stoori In the context of technology though it's important to keep in mind that:
Zipf's law is often driven by unknown or unexamined variables and is not inherently the 'natural' case for all social systems without closer inquiry to eliminate potential causes as the culprit and -
It is a mistake to consider a natural property of emergent systems as a *desirable* property of any technology, or one that promotes the best conditions for it's proliferation and use.
@Ashrand Yeah, sure. The point is, if the fediverse is not centrally governed (that is, it emerges by itself as a laissez-faire system), it will end approximating the Zipf's law.
Of course now the question is, should there be some kind of central government of the fediverse to counter this development.
@Stoori I think that is why I consider it something to worry about.
If the point is decentralization then the fact that people either have to consider central instances that dictate the spec and the standard for content 'in charge' or accept a stewardship of some kind of to manage in the same way then you have already lost, if the point *isn't* decentralization then you need to have a broader conversation about the goals that the project has/should have and how it is doing right now first.
@Ashrand The easiest way would be to implement a hardcoded maximum number of users per instance (eg 10,000).
Of course it could be forked away, but then, in any case, how to stop instances growing too much? Stop federating with oversized instances? That would in practice split the fediverse into different sub-fediverses that are following different rules.
@Stoori Well aside from the fact that splitting in that way is only a bad thing if you consider the point to be a de facto service 'for everyone' with a single agreed set of rules for conduct it also assumes that the issue is technical, when the whole reason silos like facebook and twitter are so toxic is that they try and provide technical solutions to the social problem they have created in trying to provide what amounts to a single instance for the whole world.
@Ashrand Hmm. I see it more like this: If there's a hardcoded user maximum, then every instance going rogue against it would condemn itself to be on a road to a silo, an outcast of the wider fediverse.
And there's always the other end of the distribution. Yes, there are a few massive instances, but there are thousands of smaller instances. That's what will always be missing from siloed networks.
why would smaller instances be missing from 'siloed' networks? a protective garden is sometimes walled, specifically to protect the fragile flowers that won't grow anywhere else
cellular, distributed growth and encapsulation would, I'd think, foster the growth of tons of small instances that benefit from not being squashed?
Neither would I -- and just as different organisms have smaller communities of cells and organisms that make them up physically, I wouldn't see a reason multiple feds might not exist, independent of each other, perhaps with population balancing of some sort.
Once there's a 'largest' of some sort, in social media, there tends to be problematic power effects down the line :(
I worry about it here, but I suppose only time will tell if that's how things will go.
@Stoori @Ashrand I'm curious what your graphs look like if you only consider recently active users - anecdotally I've seen a lot of people serially hop between instances b/c, say, the local timeline is too busy to be useful, or a new, smaller, more targeted instance feels homier to them
I suspect some combination of making it dead easy to start instances / move instances w/o losing followers/history + individuals wanting to be on moderately sized (or moderateable!) will give us a fat tail
like how the largest neighborhood on earth doesn't contain 1/3 of all we humans, it may be that this isn't a dataset that will follow Zipf's law inexorably, but instead will distribute like skin cells, some larger and smaller based in function, but none 1/3 of all the mass
or not, don't know, strange morning
@sydneyfalk @Ashrand 1/3 or 1/4 of the mass is indeed a lot for one conglomeration. It could be more, or it could be less. The exact shape of the distribution depends on the parameters, while the distribution itself is a good fit.
So in the future, when the fediverse population is 10m or 100m, the biggest instance may have a smaller weight, and yet the distribution would be Zipfian.
We'll see. This is an interesting phenomenon about emergence.
I'm personally hoping for more of a cell-style growth approach, because I'm increasingly convinced that hierarchical structures built to sink control of resources in a 1/3 Zipfian are kind of why there's bottlenecks of power and those are the source of a lot of unpleasantries
(but it's not like I'm really the person who would know, it's probably paranoid rambling)
So to really have an atomic distribution where even the largest instance would be a tiny fraction of the whole, would require a concerted effort to counteract the natural size distribution.
In a subset of instances this might happen, but on the whole there are always those who don't share the same goal.
@Stoori i tend to be more worried about the tail getting cut off than it disappearing naturally
We’ve had a lot of problems with spam lately, what if, in the future, the say, ten biggest instances decided it was too much of a problem and that they’d only federate with each other? Sure technically you could still run a small instance, but with so many of the newcomers at least starting out on m.s these days, it’s possible 99% would never even think to check if you were out there
@Satsuma This brings into my mind that it would be interesting to see how the spam accounts are distributed around the fediverse. Do they concentrate on big, medium or small instances?
I guess that data is almost impossible to gather.
@Stoori i know my instance (under 15,000) got a fair amount while we had open registrations, so they’re definitely not /just/ targeting huge instances
@Stoori this is obviously a more extreme example, and also something that’d be pretty shocking if it happened in the next several years
But large email servers spam filters are notoriously aggressive towards small email servers so like, it’s not completely implausible
@Satsuma Yeah, that's a good point.
@Stoori If I may ask, why is an emergent system expected to follow Zipf's law?
@kdsch Ok, I'll take a step back and say that I do not in any way claim it to be a universal law. Because I'm not into philosophical musings about statistical distributions right now.
It's just so that many emergent systems *do* follow that kind of distribution, so it's reasonable to assume that any new emergent system *may also* behave similarly. And in this case the data supports the assumption.
@Stoori That's fine. I was just curious why you would expect that; I find it very interesting, and this was the first time I heard the idea. I'm not really a stats person, but I have read about emergence.
@kdsch @Stoori though one can explain power law like distributions from a purely mathematical or distributional point of view, the most intuitive explanation is that there is an underlying preferential attachment process, or mechanism, generating it (at least thats what i remember from my classes 😆 ); that is, some instances are big because they were bigger than others in the past which gives them a bigger (higher?) chance to be chosen by new users (a positive feedback)
@marcelovmaciel @kdsch Of course, it boils down to the fact that people can choose their instance freely. If there was a mechanism to assign people randomly to different instances, the distribution would be totally different.
It will be interesting to look at the stats a year from now (etc.) and see if the s factor (in a sense a measure of concentration) stays stable or changes notably.
@Stoori Yeah, that's what I said last year. At that time, many people seemed to be taking the position that "decentralization" would cause the network to magically not exhibit the same statistics all other networks do, somehow, and instead for almost all users to be on instances with a few hundred or a few tens of users. But that has not happened; and because the protocol *still* requires all-to-all connections among instances, it would be highly undesirable anyway if it could be chosen.
@Stoori I love log-log distribution. One of the main metrics I use in my game design is to check for Zipf's law. I essentially use forensic accounting techniques to check and see if any of the stats/numbers/curves in the game feel forced.
@Stoori a meatspace parallel (which you may perhaps also have noticed) is Shimano of Japan dominate the market for certain types of bicycle components, but there are still many other manufacturers (who may or not copy Shimano's tech standards to some extent), and this is in a ruthlessly competitive capitalist market (the only main regulations on bicycles being domestic road and product safety standards)
Bicycles have ofc been around about 2 centuries, and Shimano since 1921..
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!