Friendica Social Network

Elena Rossini ⁂

2 weeks ago • •

Elena Rossini ⁂
2 weeks ago • •

Yesterday my VPS set off a warning, as it was hit by a huge spike in incoming traffic, peaking at 55GB at 2:15pm and lasting for an hour.

Upon investigating, it turns out it was my PeerTube instance that was targeted.

Where did the traffic come from?

meta-externalagent (aka Meta's web crawler which is used to grab content to train its AI system).

I feel a little bit violated thinking my Fediverse promo video was grabbed by it, sigh.

#AIcritic #NoAI

a screenshot of my VPS dashboard showing little traffic and then a huge spike at 14:15 local time showing 55 GB in incoming traffic

Lisa Melton reshared this.

in reply to Elena Rossini ⁂

Mitex Leo

in reply to Elena Rossini ⁂ • 2 weeks ago • •

I was forced to take down my SearXNG instance because of these stupid bots.

in reply to Mitex Leo

Andy Piper

in reply to Mitex Leo • 2 weeks ago • •

@ml I have mine behind Aurelia so only I can use my own searXNG

@Mitex Leo

in reply to Andy Piper

Ben Hardill

in reply to Andy Piper • 2 weeks ago • •

@andypiper @ml
I got hit by this as well last week, 30% of all hits from the bot in the last 14 days.

I've not had any response from the email address they published on their bot page, so all those requests are getting 301'd to 100GiB gzip bomb for now

blog.hardill.me.uk/2026/03/12/…

WTF is Facebook doing?

I’ve been running my HTTP server logs though a tool called goaccess for a while, this tool generates a bunch of charts showing traffic volume and how it breaks down by the different hostnames…

^{Ben's Place}

@Andy Piper @Mitex Leo

in reply to Andy Piper

computer maus

in reply to Andy Piper • 2 weeks ago • •

@andypiper @ml my searxng has been fine so far (at least to my knowledge), but thanks for the heads up, i should really put it behind my sso!

@Andy Piper @Mitex Leo

in reply to computer maus

Mitex Leo

in reply to computer maus • 2 weeks ago • •

@kate @andypiper I was running a public instance. Also didn't use Cloudflare as requested by some users.

@Andy Piper @computer maus

in reply to Mitex Leo

computer maus

in reply to Mitex Leo • 2 weeks ago • •

@ml @andypiper mine is public and i also don't use cloudflare (just my own vps with wireguard for tunneling the traffic)

@Andy Piper @Mitex Leo

in reply to Elena Rossini ⁂

Jools

in reply to Elena Rossini ⁂ • 2 weeks ago from Friendica.de • •

@Elena Rossini ⁂ You can block such AI crawlers, either with a robots.txt file. If the crawlers don't comply, you can also use Fail2Ban

@Elena Rossini ⁂

in reply to Elena Rossini ⁂

Chuckles

in reply to Elena Rossini ⁂ • 2 weeks ago • •

on the plus side, Meta's LLMs are so gullible they might start extolling the Fediverse.

Unknown parent

D1re_W0lf ⁂🇪🇺🇵🇹

Unknown parent • 2 weeks ago • •

@jools You might try Cloudflare protection for that.
Or the self-hosted equivalent, Pangolin + CrowdSec.
If you are really into it, you can add Anubis as an extra layer.

@Jools

Unknown parent

Jools

Unknown parent • 2 weeks ago from Friendica.de • •

@Elena Rossini ⁂ I know, I had that problem too. I got a good tip from @Rainer "friendica" Sokoll someone the other day. This helped me and others immediately:

rainer.sokoll.com/?p=8353

@Elena Rossini ⁂ @Rainer "friendica" Sokoll

in reply to Elena Rossini ⁂

MFierst

in reply to Elena Rossini ⁂ • 2 weeks ago • •

I can imagine that is a terrible feeling.

in reply to Elena Rossini ⁂

sam

in reply to Elena Rossini ⁂ • 2 weeks ago • •

dang I hope I didn't trigger anything by sharing your video on Facebook. I'm just trying to get some friends and family to come to the fediverse and hopefully delete Facebook (again).

Unknown parent

Jools

Unknown parent • 2 weeks ago from Friendica.de • •

@Elena Rossini ⁂ Maybe this helps: apps.yunohost.org/app/fail2ban…

@Elena Rossini ⁂

Unknown parent

Ben Hardill

Unknown parent • 2 weeks ago • •

it may be because "/." effect as the site is running on a pi... give it a few mins for the load to die down a little

in reply to Elena Rossini ⁂

RichBartlett

in reply to Elena Rossini ⁂ • 2 weeks ago • •

not sure if you've seen this bluetoot.hardill.me.uk/@ben/11…, I particularly like his response of using a 301 redirect to a massive file!

Ben Hardill

2026-03-17 09:48:53

@andypiper @ml
I got hit by this as well last week, 30% of all hits from the bot in the last 14 days.
I've not had any response from the email address they published on their bot page, so all those requests are getting 301'd to 100GiB gzip bomb for now
blog.hardill.me.uk/2026/03/12/…

WTF is Facebook doing?
I’ve been running my HTTP server logs though a tool called goaccess for a while, this tool generates a bunch of charts showing traffic volume and how it breaks down by the different hostnames…
^{Ben's Place}

Unknown parent

nathan

Unknown parent • 2 weeks ago • •

you can have a look at crowdsec too, as an alternative to fail2ban. Their doc is good as far as I remember but everything requires cli to setup.

Unknown parent

RichBartlett

Unknown parent • 2 weeks ago • •

here's an archive archive.is/Edfen

in reply to Elena Rossini ⁂

狐ヴィクシー

in reply to Elena Rossini ⁂ • 2 weeks ago • •

Maybe Meta's AI bots might finally start giving people good advice.

in reply to Elena Rossini ⁂

Sylvia

in reply to Elena Rossini ⁂ • 2 weeks ago • •

ugh. That’s just so aggravating. I have read several people mention that the meta bot is being aggressive and crashing sites.

That they can so blatantly steal data is just…

Really hope that the eu is going to do something about their theft.

in reply to Elena Rossini ⁂

Marian Scales

in reply to Elena Rossini ⁂ • 2 weeks ago • •

Ew. Gross. I feel icky and violated just reading this.

in reply to Elena Rossini ⁂

Thom

in reply to Elena Rossini ⁂ • 2 weeks ago • •

They’re doing that on purpose. My hosting provider has already contacted me to say that my site (SearxNG) is causing major traffic issues. Because of this, many small instances may have to be taken offline again. It’s like a digital war...

in reply to Elena Rossini ⁂

RootHosts

in reply to Elena Rossini ⁂ • 2 weeks ago • •

that’s frustrating — especially when it spikes traffic like that without warning.

I’m a Linux/Windows system administrator, and this kind of load can be managed. You can limit or block such crawlers and also protect your VPS with anti-DDoS, rate limiting, and traffic filtering.

If you want, I can help you secure and optimize your setup — or we can provide a VPS with built-in protection.

in reply to Elena Rossini ⁂

Mastodon Migration

in reply to Elena Rossini ⁂ • 2 weeks ago • •

Can't understand much of this thread, but get the gist. Seems like the rebel alliance at work. You guys are wonderful!

in reply to Elena Rossini ⁂

Scott Starkey

in reply to Elena Rossini ⁂ • 2 weeks ago • •

I've been considering setting up a PeerTube site for my personal videos. Is there any defense against Ai bots doing a DDOS? Can they be pre-perma-banned?

in reply to Elena Rossini ⁂

Ed

in reply to Elena Rossini ⁂ • 2 weeks ago • •

Would you be able to use a user agent block list like ai.robots.txt? I have a cron job that updates it daily from their git repo and then restarts nginx.

Except I strip out the part that refers known agents to robots.txt and just give them a 403, because none of them ever honor the robots file anyway.

github.com/ai-robots-txt/ai.ro…

GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.

A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

^GitHub

Unknown parent

Chocobozzz

Unknown parent • 1 week ago • •

@ScottStarkey @Framasoft There's rate limiting per IP, but AI bots use many residential IPs it's hard to protect yourself from them. Cloudflare or service like that can help I think, but I've never tried them

@Framasoft @Scott Starkey

in reply to Chocobozzz

Elena Rossini ⁂

in reply to Chocobozzz • 1 week ago • •

@Chocobozzz thank you! 🙏

@ScottStarkey @Framasoft

@Framasoft @Chocobozzz @Scott Starkey

Unknown parent

Elena Rossini ⁂

Unknown parent • 1 week ago • •

@sylvie thanks! I have investigated whether I could use Anubis but it would mess up with my YunoHost installation.

I need to see if I can use BunnyCDN instead (I already use it for my website)

@Chocobozzz @ScottStarkey @Framasoft

@Framasoft @Chocobozzz @Scott Starkey @ylvie

in reply to Chocobozzz

ylvie

in reply to Chocobozzz • 1 week ago • •

Not 1:1, but OpenStreetMap is using Fastly as an anti-scrapper SaaS solution which is perhaps less likely to draw ire. Some fediverse operators are very opinionated about anyone who uses Cloudflare to the point of defederating. Every Invidious instance that I’ve come across uses Anubis as a mitigation, and there are a few Caddy mitigation solutions out there too

in reply to Elena Rossini ⁂

PaulH

in reply to Elena Rossini ⁂ • 1 week ago • •

that's so stupid... 🫤

Here's a repo that blocks AI crawlers on webserver level, in this case Apache: codeberg.org/creatura85/htacce…
There's probably a similar repo for ngix as well?

htaccess

Keeping AI companies from copyright-violating a website for LLM training is difficult, but not impossible. I got pretty far using Apache .htaccess.

^Codeberg.org

This entry was edited (1 week ago)

⇧