This is a CPU graph of a web host that began having AI bots absolutely slam it starting at 4am UTC.
I blocked all Chrome user agents older than 120 at about 10:45 UTC.
These AI bots aren't using "nice" names like ChatGPT or AmazonBot. No, more like Chrome/116 or similar and they come from ALL OVER.
I am so tempted to put Iocaine or Nepenthes on the machine to generate Markov Chain garbage to poison the well, but I'd have to have Nginx map the older user agent string with regex. It probably could be done but this might piss off my employer.
@pertho Use some lua or js in nginx to generate a session ID for each initial hit, inject that in the URI and return a 301/302 and then drop (return 444
) anything that comes in on anything but / without a valid token. Use the openresty shared_dict (shared memory key-value store) to keep valid tokens around, and similarly drop anything with an expired token. And ratelimit on a per-token basis (remember to sha2 your uri before adding to the rate limiter to avoid memory fragmentation)..
It won't entirely remove your pain, but it might help a fair bit.
@sullybiker They get stuck in a search page that has tons of query strings with many iterations