Setting Up Anubis on FreeBSD
I've noticed a few times recently my Forgejo server has been triggering alerts
on my monitoring systems that is isn't responding as fast as it normally should be.
Low and behold - it's due to AI crawlers causing it to grind to a halt as the scrape every
little section of the service for their own needs.
This is how I implemented Anubis to help mitigate this problem.
Server Setup
For those that don't know, Forgejo is a software forge; Think of it as a git... hub, that you can host yourself. I run my Forgejo Instance on a FreeBSD server in a jail, and it sits behind a HAProxy reverse proxy which is also a jail.
I monitor the uptime of this service, and many others, using Gatus, which is a brilliantly simple infrastructure monitoring system that can send you alerts dependant on various factors i.e. connectivity loss, certificate expiration, and, crucially, response time.
The Problem
Over the last few months, I have noticed a significant increase in alerts due to poor response times. A quick check of the HAProxy logs shows a huge amount of traffic from AI crawlers. Something like the below:
...thousands of lines before...
Feb 19 21:47:39 192.168.200.5 haproxy[2334]: 20.171.207.150:41308 [19/Feb/2025:21:47:39.115] ft_in_https~ be_forgejo/forgejo 0/0/1/273/279 200 17290 - - ---- 6/6/0/0/0 0/0 "GET https://forge.notnull.space/psw/web/commit/078833d79049280fcc13bfc790b2835d9819de28.patch HTTP/2.0" hdrs:"host: forge.notnull.space^M x-openai-host-hash: 132087407^M accept: */*^M from: gptbot(at)openai.com^M user-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)^M accept-encoding: gzip, br, deflate^M ^M "
...thousands of lines after...
Each of these thousands of requests is scraping data from my code forge causing increased load on the server, making is respond slower and slower.
Basic Defense
My immediate goal of course was to stop the current AI slopbucket in its tracks to minimise the load on the server so it can get on with what it is supposed to be doing.
A robots.txt
file helps to some extent, but as it turns out, some AI crawlers either do not honour it, or if they are not Disallowed explicitly (i.e. they don't follow *
wildcard) they say they're allowed and get on with it.
As a further defense I set up a basic RegEx based blocklist in HAProxy to return a 403
HTTP response for any traffic coming from
somewhere that matched certain patterns
# /etc/haproxy.conf
frontend ft_in_https
# Block badbots
acl is-blockedagent hdr_sub(user-agent) -m reg -i -f /opt/haproxy-blocklist-agent.txt
http-request deny if is-blockedagent
... rest of config
and a bunch of RegEx patterns to block - be it old browser versions or names of AI crawlers.
# /opt/haproxy-blocklist-agent.txt
\WImagesiftBot\W
\WBytespider\W
\WPetalBot\W
\WAmazonBot\W
\WAndroid [0-8].0\W
\WChrome\/[0-7][0-8]\W
\WChrome\/11[0-9]\W
\WFirefox\/[0-7][0-8]\W
\W47\.82\W
\Wx-openai-host-hash\W
\Wgptbot
\Wgptbot\(at\)openai\.com
\WGPTBot\/1\.2;\W
...etc...
This works... sort of. It removes the demand on the Forgejo server so that's a lot better, but these bots still continuously send requests - it's just HAProxy denys them. HAProxy can deal with this with ease, but still it's not nice to see it all in the logs.
Enter Anubis
Anubis is a fantastic, recently developed Anti-AI software product by Techaro. It works by giving the connecting computer a proof-of-work challenge which it must solve before being granted access to the requested content. It is a project that has started to pop up everywhere, including the United Nations.
For a visitor like you or I, we will see a brief screen displayed saying "Making sure you're not a bot" before the site loads, then after that we can use the site as normal.
For an AI, however, we can configure Anubis to give a harder to solve challenge, which takes much longer, and significantly impacts how quickly that bot can send requests to our service.
This has a massive impact on how many connections HAProxy has to deal with, reducing the overall load immensely.
Building Anubis
Anubis doesn't (yet) have any native FreeBSD binaries, but that's OK, it's a Go and NPM product so we can easily build it ourselves.
First we install Go on our system:
# At the time of writing Go 1.24.2 was the lastest version - adjust as required.
$ wget https://go.dev./dl/go1.24.2.freebsd-amd64.tar.gz
# Extract it
$ tar xf go1.24.2.freebsd-amd64.tar.gz
# Copy the downloaded go folder to /usr/local/ (for GOROOT reasons used later)
$ cp go /usr/local/
# Set your PATH and GOROOT environment variables as required
$ export GOROOT=/usr/local/go
$ export PATH=$PATH;$GOROOT/bin
Now, we can save ourselves a lot of trouble by downloading the Prebaked Tarball of Anubis, which means the Go modules dependencies are included, and the static JS, CSS, etc. assets are already compiled.
We can build Anubis by the following.
# At the time of writing, v1.16.0 of Anubus was the latest
$ wget https://github.com/TecharoHQ/anubis/releases/download/v1.16.0/anubis-src-vendor-npm-1.16.0.tar.gz
# Extract it
$ tar xf anubis-src-vendor-npm-1.16.0.tar.gz
$ cd anubis-src-vendor-npm-1.16.0
# And build
$ make prebaked-build
All going well, you should have a ready to go binary ./var/anubis
which we can use.
Running Anubis as a Daemon
I wrote a basic rc.d script for FreeBSD and the pull request for it has been merged.
This script runs anubis using daemon so it restarts if it crashes.
If you want to use it, you can copy it from the archive into rc.d
$ cp run/anubis.freebsd /usr/local/etc/rc.d/anubis
$ chmod +x /usr/local/etc/rc.d/anubis
Then, enable it and set it's configuration path
$ sysctl "anubis_enable=YES"
$ sysctl "anubis_environment_file=/etc/anubis.env"
You will need to have an anubis
user account (i.e. adduser anubis
) or set the anubis_user
variable appropriately.
Configuring Anubis
Now we have the software ready to run, we need to set up a bit of configuration so it works as we need.
First, we should set some basic environment variables - a full list is available on the Anubis website.
i.e. vim /etc/anubis.env
# The port Anubis listens on
BIND=:8923
# your service endpoint, forgejo runs on port 3000 by default on this local server
TARGET=http://localhost:3000
# Cookie Domain: IMPORTANT: this should be just the root domain of your service, so if your service
# runs on forgejo.example.com, just put example.com
COOKIE_DOMAIN=example.com
# A Custom Policy File Location - mnore on this later
POLICY_FNAME=/etc/forgejo.json
# A hex encoded private key. Without setting this a new one will be generated each time Anubis restarts
ED25519_PRIVATE_KEY_HEX=############################### - run `openssl rand -hex 32` to generate one
Creating A Custom Policy
Anubis by default will aggressively challenge everything that might be a browser, which may be fine for some uses, but, often it won't be. So we can create a custom policy file that makes it nicer for nice things, and bas for bad things.
The examples in Anubis' Documentation are a pretty good fit here:
{"bots":
[
{
"name": "generic-bot-catchall",
"user_agent_regex": "(?i:bot|crawler|gpt)",
"action": "CHALLENGE",
"challenge": {
"difficulty": 16,
"report_as": 4,
"algorithm": "slow"
}
},
{
"name": "well-known",
"path_regex": "^/.well-known/.*$",
"action": "ALLOW"
},
{
"name": "favicon",
"path_regex": "^/favicon.ico$",
"action": "ALLOW"
},
{
"name": "robots-txt",
"path_regex": "^/robots.txt$",
"action": "ALLOW"
},
{
"name": "generic-browser",
"user_agent_regex": "Mozilla",
"action": "CHALLENGE"
}
]
}
The first definition makes everything connecting with a user-agent string containing "bot","crawler", or "gpt" have to take a high difficulty slow challenge.
The next 3 say that things trying to access the .well-known
folder, favicon
, or robots.txt
files are just allowed. We don't want to challenge those. i.e. certbot/letsencrypt will need access to .well-known/acme-challenge
to verify the certificate - we don't want to break this.
Finally, the "generic-browser" definition shows a basic challenge for everything else. This is what casual visitors will briefly see when they visit for the first time to confirm they are not a bot.
Ready to Go
There we go. With all that in place, we can now run Anubis and change HAProxy to Anubis' listening port and put it to work!
Run Anubis
$ service anubis start
Configure HAProxy
... rest of config
backend be_forgejo
mode http
# server forgejo 192.168.200.10:3000 # old direct to forgejo route. disabled
server forgejo 192.168.200.10:8923 # new route to anubis
Monitoring and extra bits
The rc.d script provided will log stdout and stderr to /var/log/anubis.log
so you can see the status of things and any errors.
At this time, Anubis must be run once per endpoint. That is to say if you want to run anubis for Forgejo, and a website, etc. you will need to run at least two instances of anubis.
I am only running a single instance at the moment, but I have a jail that updates and builds Anubis. This jail outputs the anubis binary to a shared location other jails can use and run independantly of each other. This allows me to run numerous instances of it, but I only have to update the Anubis binary in once place.
Final Note
If you use Anubis, and are able to do so, I strongly encourage you to donate what you can to the project so they can continue to maintain it into the future.