Blocking AI crawlers with Bunny.net edge rules

Written by:Matsimitsu(Robert Beekman)MatsimitsuRobert Beekman

Last week, I got a notice from Bunny.net (the CDN I use to host all my assets) that my card had been charged, which is odd because there should be a few dollars on my credits, and on average, I pay $1 per month. 

A quick look at the statistics showed quite a bit of traffic on my CDN subdomain. That's suspicious because I haven't been on a large trip recently, which usually generates traffic.

A dive into the logs revealed that something called "Bytespider" was crawling every single asset on my blog. (For reference, I currently have "41,187" assets hosted with Bunny). A few other crawlers were joining the party, too, mostly some unknown AI spiders.

I write my content for myself to read back later on a rainy afternoon and share it with others so they can use it to research their own trips. I'm not sharing it to feed the AI craze, and I'm certainly not gonna pay for the bandwidth.

Luckily, Bunny.net allows for edge rules that can block or re-route traffic based on headers under “Edge rules” in the CDN navigation.

What we want is to block all traffic where the user-agent header contains any of the synonyms for "bot," so we create a new "Edge rule" that matches the "User-Agent" header for any of these:

  • *spider*

  • *bot*

  • *crawl*

  • *ai*

The matches are case-insensitive, and these will match most crawlers out there.

I still monitor my logs every once in a while to see if any new User-Agent headers show up or if I'm blocking too much, but so far, this worked out really well.

Like

Webmentions

No mentions yet.