Friendica Social Network

Emelia 👸🏻

3 weeks ago • •

Emelia 👸🏻
3 weeks ago • •

So remember how I was ranting about everyone dramatically over provisioning the Mastodon's streaming server?

Well, for an idea of how well it scales, now we've fixed some really poorly performing code, mastodon.social runs 3 instances of the streaming server supporting ~10k concurrent connections

Yeah, ~3k connections per pod is a little high, but it manages to handle it. Event loop lag & GC lag may be higher with that number of concurrents.

#mastoadmin

Metrics from Mastodon.social (thanks to renchap on discord) showing approximately 30k connected channels and 10k concurrent connected clients, split evenly across the three servers (pods) they run for streaming

#mastoadmin

in reply to Emelia 👸🏻

Emelia 👸🏻

in reply to Emelia 👸🏻 • 3 weeks ago • •

Generally I'd aim closer to ~2k concurrent per pod as a maximum.

That way a pod dying is taking out 1/5th the connections, instead of 1/3rd the connections, because handling those reconnections is the most expensive part.

If you're looking at the metrics that streaming exposes and constantly seeing only <1000 concurrent connections, you probably only need a single instance, maybe two for failover.

#mastoadmin

in reply to Emelia 👸🏻

Emelia 👸🏻

in reply to Emelia 👸🏻 • 3 weeks ago • •

So if you're reading anything old about setting STREAMING_CLUSTER_NUM high for "performance":

a) this does nothing anymore, we removed clustering because it's actually more efficient to let nginx do the load balancing.

b) it often came about from people seeing the ... in the UI, that was because we had some code in the streaming server that regularly caused huge GC pauses causing streaming to pause.

#mastoadmin

in reply to Emelia 👸🏻

e.g., we'd receive one JSON message from redis, and then parse it N times to send it to N clients, when we should parsed once. The extra parsing created millions of additional objects that all needed to be garbage collected by Node.js's memory

Here's what event loop lag and GC look like on another instance I had data from:

#mastoadmin

Event loop lag graph showing an average of 3ms and max of 7.19ms (the different line colours are different instances, but that's overall irrelevant to the stats)

Process CPU usage graph, showing less than 3% usage for each category.

Process memory usage graph, showing a fairly stable 128 MiB or less

A work in progress graph of GC Duration, I believe this is in milliseconds, so all Garbage Collection cycles are taking <0.0.1 ms

#mastoadmin

in reply to Emelia 👸🏻

Emelia 👸🏻

in reply to Emelia 👸🏻 • 3 weeks ago • •

In the future we'll hopefully be adding OpenTelemetry traces to streaming and also further improving performance by moving database queries that don't change often out of the message sending loop.

This will consume a bit more memory (since we'll cache that data in memory) but it'll reduce load on the database.

We're actually looking for help with OTel: github.com/mastodon/mastodon/i…
#mastoadmin

OpenTelemetry support for Streaming · Issue #32673 · mastodon/mastodon

Pitch This is a placeholder ticket for all the bits that are needed for doing OTEL for streaming, which currently only exposes Prometheus metrics. We will get semi-far with auto instrumentation on ...

^GitHub

#mastoadmin

This entry was edited (3 weeks ago)

in reply to Emelia 👸🏻

Emelia 👸🏻

in reply to Emelia 👸🏻 • 3 weeks ago • •

The work on the streaming server was largely done in 2023, in honour of @nova, after she's shown on a stream that something really weird was up with streaming.

Turned out it was issues I'd discovered in 2018 but hadn't had time to fix.

So I spent a tonne of my own time (I wasn't yet paid as much by the community) to improve the streaming server significantly.

This refactoring is still ongoing, but I'm balancing it with other work, like FIRES and improving the moderation UI.

#mastoadmin

#mastoadmin @Kris Nóva

in reply to Emelia 👸🏻

Emelia 👸🏻

in reply to Emelia 👸🏻 • 3 weeks ago • •

If you'd like to support my work in the fediverse, then you can at:

support.thisismissem.social

I'm also starting to offer some reasonably priced support contracts for instances who'd like me to hang out in their chat & help them investigate bugs, performance issues, or confusing stuff in the moderation tools.

For that, use the "contact me" option in the link above.

Support Emelia Smith (@thisismissem)

Fund her work on the Fediverse, improving trust & safety and other open-source contributions

^{support.thisismissem.social}

⇧