So remember how I was ranting about everyone dramatically over provisioning the Mastodon's streaming server?
Well, for an idea of how well it scales, now we've fixed some really poorly performing code, mastodon.social runs 3 instances of the streaming server supporting ~10k concurrent connections
Yeah, ~3k connections per pod is a little high, but it manages to handle it. Event loop lag & GC lag may be higher with that number of concurrents.
Emelia πΈπ»
in reply to Emelia πΈπ» • • •Generally I'd aim closer to ~2k concurrent per pod as a maximum.
That way a pod dying is taking out 1/5th the connections, instead of 1/3rd the connections, because handling those reconnections is the most expensive part.
If you're looking at the metrics that streaming exposes and constantly seeing only <1000 concurrent connections, you probably only need a single instance, maybe two for failover.
#mastoadmin
Emelia πΈπ»
in reply to Emelia πΈπ» • • •So if you're reading anything old about setting STREAMING_CLUSTER_NUM high for "performance":
a) this does nothing anymore, we removed clustering because it's actually more efficient to let nginx do the load balancing.
b) it often came about from people seeing the ... in the UI, that was because we had some code in the streaming server that regularly caused huge GC pauses causing streaming to pause.
#mastoadmin
Emelia πΈπ»
in reply to Emelia πΈπ» • • •e.g., we'd receive one JSON message from redis, and then parse it N times to send it to N clients, when we should parsed once. The extra parsing created millions of additional objects that all needed to be garbage collected by Node.js's memory
Here's what event loop lag and GC look like on another instance I had data from:
#mastoadmin
Emelia πΈπ»
in reply to Emelia πΈπ» • • •In the future we'll hopefully be adding OpenTelemetry traces to streaming and also further improving performance by moving database queries that don't change often out of the message sending loop.
This will consume a bit more memory (since we'll cache that data in memory) but it'll reduce load on the database.
We're actually looking for help with OTel: github.com/mastodon/mastodon/iβ¦
#mastoadmin
OpenTelemetry support for Streaming Β· Issue #32673 Β· mastodon/mastodon
GitHubEmelia πΈπ»
in reply to Emelia πΈπ» • • •The work on the streaming server was largely done in 2023, in honour of @nova, after she's shown on a stream that something really weird was up with streaming.
Turned out it was issues I'd discovered in 2018 but hadn't had time to fix.
So I spent a tonne of my own time (I wasn't yet paid as much by the community) to improve the streaming server significantly.
This refactoring is still ongoing, but I'm balancing it with other work, like FIRES and improving the moderation UI.
#mastoadmin
Emelia πΈπ»
in reply to Emelia πΈπ» • • •If you'd like to support my work in the fediverse, then you can at:
support.thisismissem.social
I'm also starting to offer some reasonably priced support contracts for instances who'd like me to hang out in their chat & help them investigate bugs, performance issues, or confusing stuff in the moderation tools.
For that, use the "contact me" option in the link above.
Support Emelia Smith (@thisismissem)
support.thisismissem.social