Skip to main content


Yesterday my VPS set off a warning, as it was hit by a huge spike in incoming traffic, peaking at 55GB at 2:15pm and lasting for an hour.

Upon investigating, it turns out it was my PeerTube instance that was targeted.

Where did the traffic come from?

meta-externalagent (aka Meta's web crawler which is used to grab content to train its AI system).

I feel a little bit violated thinking my Fediverse promo video was grabbed by it, sigh.

#AIcritic #NoAI

Lisa Melton reshared this.

in reply to Elena Rossini ⁂

I was forced to take down my SearXNG instance because of these stupid bots.
in reply to Mitex Leo

@ml I have mine behind Aurelia so only I can use my own searXNG
in reply to Andy Piper

@andypiper @ml
I got hit by this as well last week, 30% of all hits from the bot in the last 14 days.

I've not had any response from the email address they published on their bot page, so all those requests are getting 301'd to 100GiB gzip bomb for now

blog.hardill.me.uk/2026/03/12/…

in reply to Andy Piper

@andypiper @ml my searxng has been fine so far (at least to my knowledge), but thanks for the heads up, i should really put it behind my sso!
in reply to computer maus

@kate @andypiper I was running a public instance. Also didn't use Cloudflare as requested by some users.
in reply to Mitex Leo

@ml @andypiper mine is public and i also don't use cloudflare (just my own vps with wireguard for tunneling the traffic)
in reply to Elena Rossini ⁂

@Elena Rossini ⁂ You can block such AI crawlers, either with a robots.txt file. If the crawlers don't comply, you can also use Fail2Ban
in reply to Elena Rossini ⁂

on the plus side, Meta's LLMs are so gullible they might start extolling the Fediverse.
Unknown parent

mastodon - Link to source
D1re_W0lf ⁂🇪🇺🇵🇹
@jools You might try Cloudflare protection for that.
Or the self-hosted equivalent, Pangolin + CrowdSec.
If you are really into it, you can add Anubis as an extra layer.
Unknown parent

friendica - Link to source
Jools

@Elena Rossini ⁂ I know, I had that problem too. I got a good tip from @Rainer "friendica" Sokoll someone the other day. This helped me and others immediately:

rainer.sokoll.com/?p=8353

in reply to Elena Rossini ⁂

dang I hope I didn't trigger anything by sharing your video on Facebook. I'm just trying to get some friends and family to come to the fediverse and hopefully delete Facebook (again).
Unknown parent

mastodon - Link to source
Ben Hardill
it may be because "/." effect as the site is running on a pi... give it a few mins for the load to die down a little
in reply to Elena Rossini ⁂

not sure if you've seen this bluetoot.hardill.me.uk/@ben/11…, I particularly like his response of using a 301 redirect to a massive file!


@andypiper @ml
I got hit by this as well last week, 30% of all hits from the bot in the last 14 days.

I've not had any response from the email address they published on their bot page, so all those requests are getting 301'd to 100GiB gzip bomb for now

blog.hardill.me.uk/2026/03/12/…


Unknown parent

sharkey - Link to source
nathan
you can have a look at crowdsec too, as an alternative to fail2ban. Their doc is good as far as I remember but everything requires cli to setup.
Unknown parent

in reply to Elena Rossini ⁂

ugh. That’s just so aggravating. I have read several people mention that the meta bot is being aggressive and crashing sites.

That they can so blatantly steal data is just…

Really hope that the eu is going to do something about their theft.

in reply to Elena Rossini ⁂

They’re doing that on purpose. My hosting provider has already contacted me to say that my site (SearxNG) is causing major traffic issues. Because of this, many small instances may have to be taken offline again. It’s like a digital war...
in reply to Elena Rossini ⁂

that’s frustrating — especially when it spikes traffic like that without warning.

I’m a Linux/Windows system administrator, and this kind of load can be managed. You can limit or block such crawlers and also protect your VPS with anti-DDoS, rate limiting, and traffic filtering.

If you want, I can help you secure and optimize your setup — or we can provide a VPS with built-in protection.

in reply to Elena Rossini ⁂

Can't understand much of this thread, but get the gist. Seems like the rebel alliance at work. You guys are wonderful!
in reply to Elena Rossini ⁂

I've been considering setting up a PeerTube site for my personal videos. Is there any defense against Ai bots doing a DDOS? Can they be pre-perma-banned?
in reply to Elena Rossini ⁂

Would you be able to use a user agent block list like ai.robots.txt? I have a cron job that updates it daily from their git repo and then restarts nginx.

Except I strip out the part that refers known agents to robots.txt and just give them a 403, because none of them ever honor the robots file anyway.

github.com/ai-robots-txt/ai.ro…

Unknown parent

mastodon - Link to source
Chocobozzz
@ScottStarkey @Framasoft There's rate limiting per IP, but AI bots use many residential IPs it's hard to protect yourself from them. Cloudflare or service like that can help I think, but I've never tried them
Unknown parent

mastodon - Link to source
Elena Rossini ⁂

@sylvie thanks! I have investigated whether I could use Anubis but it would mess up with my YunoHost installation.

I need to see if I can use BunnyCDN instead (I already use it for my website)

@Chocobozzz @ScottStarkey @Framasoft

in reply to Chocobozzz

Not 1:1, but OpenStreetMap is using Fastly as an anti-scrapper SaaS solution which is perhaps less likely to draw ire. Some fediverse operators are very opinionated about anyone who uses Cloudflare to the point of defederating. Every Invidious instance that I’ve come across uses Anubis as a mitigation, and there are a few Caddy mitigation solutions out there too
in reply to Elena Rossini ⁂

that's so stupid... 🫤

Here's a repo that blocks AI crawlers on webserver level, in this case Apache: codeberg.org/creatura85/htacce…
There's probably a similar repo for ngix as well?

This entry was edited (1 week ago)