Before we get deep into that, when we left last night I was extremely tired and had been working at my computer for over 14 hours. I then said I was going to drive two hours across the state that evening.
Thankfully thanks to the support of people who love me, I did not do that foolish thing!
So anyway, I am better rested, and also I woke up to the surprise that our fundraiser is doing a lot better, like by a lot, than it was yesterday, which is nice because I was extremely stressed out https://spritely.institute/donate/
So I am feeling much better and alive and today I remembered to eat lunch
But you probably aren't here to hear about my lunch choices or how much sleep I got or whether or not I forgot to bring my ADHD medication with me (I did so now I am drinking a bunch of caffeine instead), you are probably here to hear the rest of the analysis about decentralization and Bluesky etc
So let us get to it, let's talk about whether or not Bluesky can scale *down* in a meaningful way.
In my last essay I made assertions that this was important for decentralization and said ATProto wasn't great for this, and this was one thing people challenged me on
So let's take a look!
When I say "scale down", what I generally mean is "small instances can generally participate on the network". (We'll talk about "scale wide" later.) But another useful possibility which has come up is "can you make a smaller, more isolated use-case and use the same protocol for it"
This latter version of scale down does come up in Bryan's article:
> A specific form of scale-down which is an important design goal is that folks building new applications (new Lexicons) can "start small", with server needs proportional to the size of their sub-network.
(cotd)
Strictly speaking, I agree, ATProto can scale down in this use case! For example, if you wanted to make a small specialized forum for collaborative storytelling, you could use ATProto for it, and that's true, you could do it
But is it the right choice?
In some ways we are talking about two different things here: extension of functionality (which you might want the same scale for) and having a smaller and more isolated community
But regardless
ATproto positions itself *specifically* as designed for not wanting to miss messages, and I talked previously about how ATProto's design requires a god's-eye view.
It's a bit strange of a choice when you say "let's run a smaller community"
Given that message passing systems handle small scale systems *beautifully*, and *still* allow for interactions with larger scale systems, it's a bit confusing to me *why* you'd choose ATProto for such use cases. What is the specific benefit you'd gain? Especially because it's actually lossier here
At any rate, there's a bit of conflation here. "It scales down" by saying "you can have an isolated community/use case that's oblivious to the rest of the system" is categorically distinct from "it scales down" in terms of "a small node can meaningfully participate with the larger system"
At any rate, the problem with "scaling down" is much clearer when it comes to the problem of "scaling wide".
Or let me put it a different way: ATProto *explodes in complexity* when you try to scale it towards meaningful decentralization
Yes that's right we're getting to the spicy part of this conversation. We did the warm-up, now it's time to talk about the real thing, whether or not decentralization in the way I believe people *think* that term means is reasonably possible with ATProto as it's currently designed
But before we do that, I need to stretch and run to the bathroom
So for those of you following along, if you found this, Secret Goblin #3, let me know: "
Oops wait actually we gotta talk about that one for a sec there's a reason I left it in scare quotes
Why on earth is the textual descriptor for Unicode U+1F47A "JAPANESE GOBLIN", does anyone know?
It's a Tengu, right?
Despite being the only actually named "goblin" emoji, I feel awkward about this one because is it correct to call it a "JAPANESE GOBLIN" instead of just "TENGU"?!?!
I don't know!
If you have knowledge or OPINIONS about "
Otherwise it's time for a
=== STRETCH BREAK ===
I'm back. It's time to talk about it: does Bluesky/ATProto suffer a "quadratic explosion" as we move from centralization towards *meaningful* decentralization?
I claimed it did, but I was challenged on this. What did I mean? Am I right or wrong?
It's time to find out!
In the previous blogpost I said the following:
> If this sounds infeasible to do in our metaphorical domestic environment, that's because it is. A world of full self-hosting is not possible with Bluesky.
(cotd)
Decentralized ATProto is quadratic quote, cotd:
> In fact, it is worse than the storage requirements, because the message delivery requirements become quadratic at the scale of full decentralization: to send a message to one user is to send a message to all. Rather than writing one letter, a copy of that letter must be made and delivered to every person on earth.
This was probably the thing I got the hardest pushback on from a team member of Bluesky, that it is not quadratic as we scale towards decentralization.
Truth be told, I don't have a degree in CS. Most of what I know I learned from studying independently and community resources. Was I wrong?
Just as a quick aside, regarding that comment about "agency", maximizing the agency of everyone (and more importantly, minimizing subjection!) sits at the heart of my ethical framework https://fossandcrafts.org/episodes/11-an-ethics-of-agency.html
So I don't disagree on that part, but that's an aside!
Now, I said I won't read replies until I am done summarizing things, and that's true, so maybe someone has gone out of their way and proven that I am wrong, that the claims in my article are factually incorrect and so on and so forth. I wouldn't know yet.
But... I don't think I'm wrong.
As said I'm very self-conscious about these things because I *don't* have formal CS training. But I do a lot of research and so I've tried to become knowledgeable about these things and this *seemed* like the correct analysis to me
Because of that, I turned to people who actually knew more than me
For one thing I derailed the entire Spritely morning standup by walking everyone through the scenario. I gave the story example, which I'll detail later.
But @dthompson didn't find the story helpful, too much narrative detail. "I need to work through this example independently." So he did.
@dthompson came back and laid it out in more formal terms and said I was right.
But I was still nervous, so I called up one of my old MIT AI Lab type friends and rambled about it to them on a call. What did they think?
"I think it's pretty clear immediately that it's quadratic. This is basic engineering considerations, the first thing you do when you start designing a system," they said.
Well that's a relief, why isn't it clear to everyone else, I asked?
So they suggested I lay it out to you as I did to them.
Let's start with the following:
- ATProto has positioned itself as "no compromises on centralized use cases". Well, in that case, let's say it can't do *worse* than eg ActivityPub. This includes with replies. You can't do *worse* than ActivityPub on replies and mentioning someone, etc.
- We will interpret the most centralized system as one where there's only one provider for storage and distribution of all messages: the least amount of user participation
- The flip side of the spectrum of maximum decentralization is the *most* amount of participation: every user self-hosts.
- Just as blogging is decentralized but Google (and Google Reader) are not, it is not enough to have just PDS'es in Bluesky be self-hosted. When we say self-hosted, we really mean self-hosted: users are participating in the distribution of their content.
- We will consider this a gradient. We can analyze the system from the greatest extreme of centralization which can "scale towards" the greatest degree of decentralization.
- Finally, we will analyze both in terms of the load of a single participant on the network but also in terms of the amount of network traffic as a whole.
Okay. That is the structure we will use for our analysis. Let's compare "message passing" vs ATProto-style "global public shared heap".
So okay. Let's get the CS notation out of the way:
"Message passing" at full decentralization:
- O(1) from a single node's perspective
- O(n) from a whole-network zoom-out perspective (inherent: add a user, it's one more user)
Okay, that's reasonable and what you'd expect
"Public global no-missed-messages (or not worse than AP) shared-heap" ATProto style at full decentralization:
- O(n) from a single user's perspective (!)
- O(n^2) from a whole-network perspective (!!!!!!)
Oof I'd better back this up because that ain't good!
In other words, as our systems get more decentralized, message passing handles things fine. Individual nodes can participate in the network no matter how big it gets. The zoom-out for the network as a whole doesn't get more complicated as we add more users OR move more users towards self hosting.
Things are NOT good, if I'm correct above, as we make things more decentralized in the atproto-public-shared-heap model. The more self-hosting and indeed the more "full nodes" join, the more it gets expensive for each of the nodes and the network EXPLODES!
Truly self-hosted atproto is NOT POSSIBLE!
And there is no solution to this without adding directed message passing. Another way to say this is: to fix a system like ATProto to allow for self-hosting, you have to ultimately fundamentally change it to be a lot more like a system like ActivityPub!
Now I left more of the precise analytical explanation in my blogpost. But social media isn't great for that, so go check out my blogpost if you want to go through all that (eg if you're more like @dthompson and less like me, I'm a narrative person) https://dustycloud.org/blog/re-re-bluesky-decentralization/
Here's our story:
- We have 26 users: [Alice, Bob, Carol, ... Zack].
- Each user sends one message per day, which is intended to have one recipient. (This may sound unrealistic, but it's fine for modeling.)
- Each user sends a message in a ring: Alice => Bob, Bob => Carol, ... Zack => Alice
Now just before you say "wait but ATProto isn't for DMs", yes, but one way this could happen is that eg Bob follows Alice, Carol follows Bob, etc.
What I'm saying is, messages can have an "intended audience". That's what we're using here.
Before we get into this, remember, the main difference between "message passing" and the "shared heap" is the former has directed and delivered messages, the latter does not. See prev blogpost for explainer.
So, what happens in a day for both systems? Because that's what we really want to find out.
Under message passing, Alice sends her message to Bob. Only Bob need *receive* the message. So on and so forth.
- For an individual self-hosted node, messages passed per day: 1.
- Per the decentralized network, total messages passed zooming out: 26.
That's about what we'd expect.
Under the public-gods-eye-view-shared-heap model, each user must know of all messages to know what may be relevant. Each user must *receive* all messages.
- Individual self-hosted server, 26 messages must be received per day.
- Zoom out on whole decentralized network: 26*26: 676!
Sounds survivable with 26 users though, right?
Let's try just adding 5 more users.
Message passing:
- Per node per day: no change.
- Per the network: 5 more messages.
Public gods-eye-view-shared-heap-model:
- Per node per day: 5 more per day
- Per network: ((31 * 31) - (26 * 26)): 285!
Now, could we handle a million self hosted users? Is it possible? No problem in message passing. EXPLOSIVE with atproto.
What if we had a million users and added just 5 more? How many more messages must the network bear?
5 new messages in message passing.
*10,000,025* new messages sent in atproto!
"Christine that's ridiculous, we're not expecting a million self-hosted users"
Well I think it would be nice!
But regardless, ActivityPub has 27,000 servers on it, all meaningfully participating in the network.
ATProto, in its current design, would be crushed to DEATH
"But Christine", you may say, "I heard gossip might fix this!"
No. It cannot.
In fact, I was being more generous than a gossip network, and assumed you only *received* a message once.
With gossip you might *receive* more than once.
But you need to receive a message to know it.
ATProto was designed for a "big world" view. That's fine! But I'm trying to show seriously what happens if it was actually, really decentralized.
*Every* fully participating node added to the network makes the network explosively more expensive.
ATProto doesn't scale towards decentralization.
In other words, the public god's-eye-view allows for a pantheon, but not a civilization. You can only have so many gods who see all.
An important characteristic of a decentralized system is scoping what you *don't* need to know.
This wasn't in the design goals of ATProto, and it has effects.
I may be coming across as some academic computer science nerd. It's actually the opposite. I'm a humanities nerd who cares about the agency of users so much I've twisted myself into a shape where I can do a computer science thing.
But architecture matters. It affects the worlds we can have.
This is what I say when I say that Bluesky's goals of "credible exit" may be reasonable, but it's not decentralized. There is no getting around the fact that the system, as designed, is designed for a few large players. Small players can play on the *periphery*, but they can't play the big game.
Now, you might think, maybe ATProto could fix this!
And it can.
And the solution, ultimately, will end up looking... a lot like ActivityPub.
The point is that nearly everyone knows at this point that "sure, Bluesky is centralized today, in practice!" But a lot of the responses I see are "but decentralization is just around the corner thanks to ATProto!"
So that's why I'm writing this out.
Well, that's it. We've reached as far as we're going tonight.
There's still a bit left, a bit of reframing about what I am and am not concerned about with decentralized identity, and then a bigger topic about Bluesky's design goals vs community expectations. Then we'll talk talk about values.
Those last two, expectations and values, are really important to me. And I think they'll maybe be the most thoughtful part of all of this.
Of course, they're probably not what most people care about from me, about this. Probably what I've said is all many care to hear from me and that's fine.
For those who care about such things, tune in tomorrow, where hopefully we'll wrap this up. For those who were just hoping to hear the decentralization analysis, hope you found it useful.
Regardless, I wish you a very happy
=== REST OF TODAY BREAK ===
Well hello.
So yesterday I stepped onto a crumbled piece of sidewalk, twisted and sprained my ankle, and fucked up my wrist. That, and I think I've said the most important things and this is day *three* of summarizing things from my blogpost, so I will be brief.
It was nice to be prompted about @spritely's values and it lead to a good conversation internally, and we did capture those in my blogpost, but I think that should be covered again from a more official organizational side, separate from this.
I also clarified a bit: the parts I'm concerned about with the did:plc stuff aren't as much the governance, and I think Bluesky is taking some good steps there by planning a certificate transparency log. That's good. Glad to see it.
I do think Bluesky is heading in a tough direction though in terms of community expectations vs the ATProto philosophy that replication and indexing of a firehose are the primary way things work.
It's a tough situation but Bluesky is speedrunning Twitter so fast it practically is Twitter.
People want Bluesky's devs to prevent their content from being replicated and indexed by people they don't like, well, I think it really is that: a *conflict*.
People were encouraged to join a Twitter replacement, they are expecting Twitter-like solutions. Can't blame 'em.
@cwebber I hate when they say it isn't realistic as if there aren't people like me hosting 3 separate iterations of AP clients
@cwebber
Interesting thread, thanks!
@cwebber Look, even not being able to follow all the technical details, I am glad that you're having these thoughts and that they've sparked such a good conversation. Thanks.
@cwebber im sure this will be a very interesting (if niche) book
@cwebber Excellent points and you bring some clarity to the situation that all should be aware of.
@cwebber My only significant comment on this part is that there's a third option that few people seem to consider, but that @librecast is built around and @interpeer as well, albeit at a slightly different level:
What you call message passing would be unicast in a packet switching protocol like IP. One sender, one recipient.
What you call shared heap would be more like broadcast. One sender, all recipients.
There's also multicast. One sender, interested recipients *only*.
The fun part is...
@cwebber ... that ActivityPub kind of could be multicast, and kinda sorta is.
Both are effectively publish subscribe mechanisms.
The issue with AP (as I understand it), is more one of implementation choices than conceptual things.
Because to make multicast work, you (TL;DR) address groups, and have the routers/instances subscribe to those groups on behalf of their users, then redistribute to the local subscribers.
It's more efficient the moment a recipient group has >1 members.
@cwebber This is doable at multiple layers, also in the social layer.
It can get more efficient when you have that support down the stack, which is the aim of our respective projects.
@cwebber Yeah it's like... they massively benefit from "selling themselves" as Twitter-but-good despite really wanting to be Mastodon-but-more-Twitter-like and that's a very different set of expectations
@cwebber get better soon
@cwebber hope you get better soon.
@cwebber I'm curious about your expectations and values. When the US election results came in, I remember you reflecting how you could best fight the upcoming political [bleep], and your conclusion was to continue your work on Spritely. I think that's immensely powerful. How many people can say that they're working on what they believe to be truly important? (Even if it includes the dreaded raising of funds.
Rock on Christine, but don't forget to take care of yourself!
@cwebber THIS is what I've been trying to explain to people. People keep saying "well you don't have to federate with everything, you don't have to have a full 100% relay"
Sure, you don't. But what you end up with are the exact same things about fedi people complain about: "I can't see every post", "stuff is missing", "I can't find my friends"
Just because you can do something on a technical level doesn't mean you can do it and still have people engage with it the way they are on the current 100% centralized corporate funded Bluesky. And the lesser self funded Bluesky (or other ATproto services) those people will be a lot less interested in if the "official" one dies.
@cwebber I think you meant to write "centralized today, in practice" here, right?
@josef Yep oops, fixed
@cwebber It's hard not to imagine, with AP being so *available* prior to the start of bsky development, that the choice to create AT Proto was an intentional choice to stealth centralize.
Why go to all the engineering effort to develop something when the existing protocol is right there?
@cwebber We need symmetry and Big Tech will never ever give it to us even if they promise.
@cwebber it’s why i love you
@cwebber that is a great point, basically to apply Conway's law at scale here !
@cwebber it hit me, reading this, that ATProto is more or less trying to reimplement Usenet News without some of the features that made it possible to actually host a news node. I’m sure this isn’t a new observation, by any means, but remembering what it was like to try and keep even a partial feed node running… well, times have changed since the ‘90s but their design doesn’t make me think the ATProto folks remember news, or its issues.
@cwebber I’ve thought about this since ATProto came on the scene, and this further reinforces my view that ATProto was going to be Twitter’s attempt to monetize corporate control of social media, while giving the appearance of user freedom and choice. Spinning off from Twitter was just an unfortunate side effect of Musk’s takeover.
@cwebber I am as you know on your side in all this but there are people I disagree with who argue that having an incomplete view of all messages globally on ActivityPub is not "meaningfully participating" and I have trouble articulating my position with these people. Feels like a dead end where persuasion is highly unlikely
@darius would one answer be to point out that on that perspective humans in real life would only ever ‘meaningfully participate’ in exceptional circumstances (i.e., when there is a wholly closed conversation for which all participants are present)
@semitones right. Seems intractable
@dynamic @cwebber @semitones right. I meant all messages period, not all replies to a post you're looking at
@darius @dynamic @cwebber maybe this is just Mastodon brain, but the things I've noticed "missing" are predominantly 1. Posts with hashtags 2. Replies on posts 3. Someone's post history (before I've followed them). Not a showstopper for me, but definitely not what people expect from a Twitter-clone. Sounds like the Bluesky "zero-compromise" worked as a drop-in replacement for Twitter in a way in which decentralized social media maybe can't?
"Not a showstopper for me, but definitely not what people expect from a Twitter-clone."
Alternatively, there could be 3 or more available (and selectable) reply threads: 1) The OP’s organic replies, 2) The OP’s personally moderated replies, and 3) curated reply threads by 3rd parties. The latter might often be the most valuable.
To me the *ideal* is for *everyone* who uses social media to either self host or to be hosted by someone they know and trust personally. Even a tiny fraction of this would be far, far more than a million users!
Will we get there? Who knows? Few would have predicted that we would even get to where we are now, and numbers are only growing...
@cwebber if this math is right, bluesky cannot decentralize in a way that delivers the expected meaning of the word to the humans participating.
@cwebber one thing I am surprised no one has mentioned.. the very philosophy of a gods eye view is inherently a centralizing one ?
@fleeky @cwebber I'm going to be thinking about this for a while. I don't think that's what I mean in my stats classes when I use similar phrases (e.g., "If you were a godlike being who could see all cases and observations in the population, not just your sample..." because I'm thinking such a being would see all things at the same time.
However, perhaps the entire concept of one being seeing anything, no matter how vast and inclusive the view, also implies one perspective.
Now I'm thinking about frequentist vs. Bayesian stats vs. other pathways to numerical knowledge...