Correct me if I’m wrong. I read ActivityPub standards and dug a little into lemmy sources to understand how federation works. And I’m a bit disappointed. Every server just has a cache and the ability to fetch something from another known server. So if you start your own instance, there is no profit for the whole network until you have a significant piece of auditory (e.g. private instances or servers with no users). Are there any “balancers” to utilize these empty instances? Should we promote (or create in the first place) a way how to passively help lemmy with such fast growth?
every instance is sharing in the traffic to browse the fediverse. Not one service is responsible for serving content, you (the instance admin) are only serving for your members.
The downside of this is there is a huge amount of replicated data stored everywhere. Content of popular communities will be scraped by and stored on many many servers, filling up servers and increasing storage and bandwith bills for all those servers
I’m not sure your second paragraph is correct. First of all, it’s “just in time” so will only be replicated if somebody on that instance is following it. But more importantly, I read a statement from a server owner somewhere that the software purges older content regularly (and refetches is “just in time” when somebody tries to view the old content) to keep storage size down.
If this is the case, then wouldn’t his fitst paragraph be incorrect also? Because if it is “just in time” with quick purging, the main server still has to constantly serve the instance server the content. It would only be beneficial if many instance users are trying to view the exact same content at around the same time (so for the “massive” communities maybe?)
From my understanding you are correct. Each instance is responsible for serving all of the content of the communities created on it. So many small instances with a smaller amount of communities = good, a few huge instances with lots of communities = bad.
The purging of older content is good news I didn’t know that.
Please elaborate, how is “every instance is sharing in the traffic to browse the fediverse”. I didn’t find it nor in AP standards, nor in activitypub_federation lib docs. If there is some mechanisms of balancing inside the lemmy’s code, would you mind pointing it for me?
Looking into the database, it contains many thousands of posts. I’m assuming this is stored in the local db for serving it to instance members. So when you open a post from instance B on instance A, A fetches post-data from B, stores it in A database, then serve the content from db A to the browser
Yes, you are right. If this instance has members. A server will actively fetch “foreign” content and cache when this instance’s user asks. But aside of top 10 servers, there is no profit of having more until they have a couple of dozens of users. If any server would have been able to “delegate” request handling to less busy servers, it will be a solution for this uneven load.
The replication isn’t all that bad. Images stick around in their local instance, the federated data is all JSON payloads and metadata. Yes it will pile up over time, but only instances with hundreds of users and thousands of indexed communities are at risk of massive storage needs.