Social.coop @SocialCoop

Recent searches

Search options

Only available when logged in.

**Now at cwebber@social.coop !** @cwebber@octodon.social · Nov 8, 2024 *

Nov 8, 2024 *

Now at cwebber@social.coop ! @cwebber@octodon.social

Is the storage expectations for self-hosting ATProto including a relay really 5tb (with the expectation also that this will grow)? https://alice.bsky.sh/post/3laega7icmi2q

On Linode's default shared hosting that's getting into a full salary, like $55k/year, territory https://www.linode.com/pricing/

Is this right?

alice.bsky.shalice.bsky.shthis is a blog. there are many like it, but this one is mine.

**Now at cwebber@social.coop !** @cwebber@octodon.social · Nov 8, 2024

Nov 8, 2024

Now at cwebber@social.coop ! @cwebber@octodon.social

Every self-hosted user on the instance also needs to fetch every self-hosted user on the network, which also seems it would mean that the amount of network traffic for retrieving information is O(n^2)?

**Clairement crevée** @Claire@sitedethib.com · Nov 8, 2024

Nov 8, 2024

Clairement crevée @Claire@sitedethib.com

@cwebber if by “instance” you mean the relay+AppView, no, users would just query it and get ready-to-use results like you can currently do on bsky.app?

but yeah, you are not meant to self-host your own bluesky, only your own data repository basically

**Now at cwebber@social.coop !** @cwebber@octodon.social · Nov 8, 2024

Nov 8, 2024

Now at cwebber@social.coop ! @cwebber@octodon.social

@Claire The article I pointed to was about fully self hosting your own atoproto infrastructure, including relay, and people seem to be getting excited about it on bluesky as being feasible, and I'm feeling uncertain it is

**damien (God's favorite little kitty)** @eramdam@erambert.me · Nov 8, 2024

Nov 8, 2024

damien (God's favorite little kitty) @eramdam@erambert.me

@cwebber @Claire Clearly for an individual it's far too expensive to run the whole stack, but I guess the expectation is that 3rd party will like... form to gather that kind of resources/funding to run relays? Still unclear to me how things actually work when more than one Relay exist. Is the Bluesky app supposed to be able to switch between multiple at once?

**Clairement crevée** @Claire@sitedethib.com · Nov 8, 2024

Nov 8, 2024

Clairement crevée @Claire@sitedethib.com

@eramdam @cwebber the front-facing Bluesky app is built to use Bluesky PBC's AppView, i don't think it has any provision to switch to another provider with the same API (but i might be wrong)

you can theoretically build an equivalent relay+AppView that works in the same way with the same data (although you have yet to build some of the pieces yourself, not everything in bsky is open source afaik), but it's unclear what incentives you'd have to do that

**Now at cwebber@social.coop !** @cwebber@octodon.social · Nov 8, 2024

Nov 8, 2024

Now at cwebber@social.coop ! @cwebber@octodon.social

@Claire @eramdam Well I mean, Bluesky's ATproto recently has made the rounds as the "more decentralized" protocol over ActivityPub, and clearly a lot of people on my Bluesky feed right now seem to think it's more decentralized, and it does have one major improvement which is that it uses content-addressed posts (I don't think the DID methods currently actually do their job though the goal is good, I can expand on that at a future time)

Which is what's leading me to look more into it, in which ways is it more decentralized in practice? Is it even particularly decentralized in practice? I'm trying to gain a technical understanding.

**bryan newbold** @bnewbold · Nov 8, 2024

Nov 8, 2024

bryan newbold @bnewbold

@cwebber @Claire @eramdam don't think the goal w/ atproto is to be "more decentralized" in the abstract. we (team) had worked on SSB and dat, which were radically decentralized/p2p but hard to work with and grow. would not supplant "the platforms".

atproto came out of identifying the *minimum* necessary decentralization properties, and ensuring those are strongly locked in. we basically settled on:

**bryan newbold** @bnewbold · Nov 8, 2024

Nov 8, 2024

bryan newbold @bnewbold

@cwebber @Claire @eramdam
unbundling and composition: ability to swap components w/o swapping full stack.

credible exit: ability for new interoperable providers to enter network. public data must be accessible w/o permission, and schemas/APIs declared

control identity: whole network/proto doesn't need to be individualist, but identity is personal

easy to build new apps: don't build for the old modalities, enable new modalities. accomodate opinionated devs.

**bryan newbold** @bnewbold · Nov 8, 2024

Nov 8, 2024

bryan newbold @bnewbold

@cwebber @Claire @eramdam I have a longer post on this, and our progress, on my personal blog:
https://bnewbold.net/2024/atproto_progress/

bnewbold.netbnewbold.net

bryan newbold @bnewbold@social.coop

@cwebber @Claire @eramdam
I think a bunch about this post about the history of mp3 piracy and "minimum viable decentralization":
https://web.archive.org/web/20180725200137/https://medium.com/@jbackus/resistant-protocols-how-decentralization-evolves-2f9538832ada

(though it wasn't directly influential on atproto design, and Backus has since pulled the post)

Nov 08, 2024, 07:16 PM·

0boosts·5favorites

**Risotto Bias** @risottobias@tech.lgbt · Nov 8, 2024

Nov 8, 2024

Risotto Bias @risottobias@tech.lgbt

@bnewbold @cwebber @Claire @eramdam

don't expect a tracker company that plans to make money on the big data of tracking torrents to fully embrace the magnet file specification or the DHT in favor of their cure.

e.g., the "VPNs cure everything" of social media

you can't make someone learn that which their job depends on them not getting.

**bryan newbold** @bnewbold · Nov 8, 2024

Nov 8, 2024

bryan newbold @bnewbold

@risottobias @cwebber @Claire @eramdam the value of data-for-sale is usually proportional to how exclusive access to it is. if data is public, it is commodity and "worth" a whole lot less.

IMHO client apps are way under-appreciated in this regard: they can track attention/behavior way better than API servers can.

**Nemo_bis** @nemobis@mamot.fr · Nov 9, 2024

Nov 9, 2024

Nemo_bis @nemobis@mamot.fr

@bnewbold Thank you Bryan, this is extremely helpful!

I hope to see multiple #BlueSky relays soon (incentives unclear: https://neuromatch.social/@jonny/113364719373034539). I worry about the climate costs of many full copies.

One accidental design feature in #Mastodon is how an instance serves as "relay" with a cache of posts and media caused haphazardly by whatever happens to federate. This is messy but flexible. https://masto.host/re-mastodon-media-storage/ Instances can share deduplicated object storage. https://jortage.com/

Jortage storage service for Mastodon instances: « The Storage Pool project is currently storing 41.80TiB across 122.93 million objects for all our members. Without deduplication, it would be storing 93.52TiB across 290.83 million objects.»

«Let's talk about a simplified real-world example. Someone with 1,000 followers, across 100 instances, makes a post with 4 media attachments, 2MB each. Those 100 instances are pushed the status by the original instance, and they all immediately download it from the original instance. This causes a surge of traffic, totalling 800MB. All these instances then upload this media to their storage provider, and if multiple of those instances are using Backblaze, then Backblaze themselves performs deduplication, but doesn't share the benefit. The fediverse has just grown by 804MB, and any Wasabi-using instances have to pay for that for at least 90 days, even if they delete it.

Let's say 10 of those instances use the Jortage Storage Pool, and so does the originating instance. The standard case has the same surge in traffic, but instead of 80MB being stored for those 10 instances, 0MB is stored, because Jortage already has the files. Additionally, the traffic surge is absorbed by our CDN, being designed for precisely this kind of issue. If the original instance doesn't use Jortage, then only 8MB is stored. The traffic surge is difficult to prevent due to how Mastodon's media upload system works, and the facts of how S3 works.»

#FediMeta

**Luca Sironi** @luca@sironi.tk · Nov 9, 2024

Nov 9, 2024

Luca Sironi @luca@sironi.tk

@nemobis @bnewbold

very interesting, but if we bring decentralization to the extreme of almost personal server, up to say 100 users, which is a thing activitypub allow to do with bare minimum hw/network...

there is no need to have a remote storage for caching media.

A server with reasonably few users, download just a fraction of media from the fediverse, and don't have to keep them forever.

We are chatting, not mirroring the whole internet.
If you see something on internet, a beautiful picture, a long article, and you like it, you save it, you don't expect it to find it online next year

**Nemo_bis** @nemobis@mamot.fr · Nov 9, 2024

Nov 9, 2024

Nemo_bis @nemobis@mamot.fr

@luca It's the opposite. The amount of media to be cached per user grows less than linearly with the amount of users on an instance. A single-user instance has the highest level of storage waste. An instance with a million users (like mastodon.social) already mirrors pretty much everything so they would have little benefit from deduplication.

You don't need to guess, we already have actual numbers. Check the figures of Jortage members.

**Nemo_bis** @nemobis@mamot.fr · Nov 9, 2024 *

Nov 9, 2024 *

Nemo_bis @nemobis@mamot.fr

@luca We're saying the same thing but I'm not sure about the details. Perhaps the optimal size for storage is closer to 1000 users rather than 100. I'm not sure I'd call 1 TiB of storage (the average in the Jortage pool) "minimal". It's still much better than 90 (what they'd need if each member of the pool mirrored everything).

Recently #Mamot started purging old media and posts with the extremely broken https://github.com/mastodon/mastodon/discussions/19260 . It's not clear why, perhaps to simplify PostgreSQL scaling.

GitHubRE: retention policy for cached content and media · mastodon mastodon · Discussion #19260The recent PR #19232 adding a retention policy for cached content and media can significantly reduce the resources used by Mastodon. In particular, the storage needs. This is something that admins ...

**Luca Sironi** @luca@sironi.tk · Nov 9, 2024

Nov 9, 2024

Luca Sironi @luca@sironi.tk

@nemobis

It's all very clever, i really appreciate all attempts of optimizing storage, that's the only way big and medium instances can survive.

My point is, my single user instance has the highest level of storage waste but it's already there, paid.
How many people will see a picture because it's cached on my pleroma for a month ?

Of course this works because one user is one, i receive daily just the amount of media i'm able to consume

What if all users were standalone ?
The most wasted internet resource is residential landline bandwidth

**Nemo_bis** @nemobis@mamot.fr · Nov 9, 2024

Nov 9, 2024

Nemo_bis @nemobis@mamot.fr

@luca A single-user Pleroma instance is indeed a very different matter because Pleroma relies on hotlinking from remote instances. Only other Pleroma users or people who visit your instance directly will see your own media served by your server. I see it through Mamot's cache.

**your auntifa liza** @blogdiva@mastodon.social · Nov 9, 2024 *

Nov 9, 2024 *

your auntifa liza @blogdiva@mastodon.social

“What if all users were standalone ?
The most wasted internet resource is residential landline bandwidth”

this is EXACTLY what i want to set up: a localhost setup that is my instance, used to post on not just Mastodon but wherever TF i have it connected through Oath, RSS, etc.

this is the true decentralization, not me having all my data housed on someone’s server.

so if i delete my account anywhere, i still have my posts locally.

my main web instance should be a mirror.

@luca @nemobis

**bryan newbold** @bnewbold · Nov 9, 2024

Nov 9, 2024

bryan newbold @bnewbold

@nemobis atproto relays currently mirror only "records", not media blobs, so size isn't too crazy.

we think a degree of duplication and mirroring is good/healthy for the network. similar to having multiple copies of git repo checkouts. but a few dozen full network copies is probably plenty?

resource/climate-wise, what we see in our infra is that "reverse chron" timelines, and to some degree notification tracking, are expensive (much more than relay). ironically "algo feeds" are cheaper?

**bryan newbold** @bnewbold · Nov 9, 2024

Nov 9, 2024

bryan newbold @bnewbold

@nemobis at least for us, now, at current scale, for the bsky app, "reads" are more expensive (in kilowatts and silicon) than "writes". I don't think this would be particularly more efficient if network distributed the load more to many smaller nodes (vs having big API servers).

it is possible to "shard" parts of network for things like search queries. https://yacy.net/ might be relevant. I don't think that helps with efficiency though?

yacy.netHome - YaCyYaCy P2P - Decentralized Search Engine

**Nemo_bis** @nemobis@mamot.fr · Nov 9, 2024

Nov 9, 2024

Nemo_bis @nemobis@mamot.fr

@bnewbold Interesting! IIRC one of the early Twitter papers had great findings on how they handled sorting. Broadcasting is easier.

I can easily believe a single centralised relay is more efficient (as long as it fits the biggest machine you can get). A few dozens relays sounds reasonable; 10k (the number of Mastodon servers) less so. Relays would need to be pooled resources. A managed service à la masto.host would offer just one, shared by hundreds/thousands of customers.

**Nemo_bis** @nemobis@mamot.fr · Nov 9, 2024 *

Nov 9, 2024 *

Nemo_bis @nemobis@mamot.fr

@bnewbold Sharding metadata across database servers is challenging. No idea how to solve that problem. It would be fascinating to compare the resource efficiency of the DB servers at a large Mastodon instance like mastodon.social vs. hundreds of small ones across 20 DB servers like on masto.host. Any DB researchers in the room?

Drag & drop to upload

Recent searches

Search options

Administered by:

Server stats: