Thanks to this post from Adam Johnson, I’ve now updated my configuration to block OpenAI1 and Meta2 from crawling this website to feed their LLMs.
If you would like to do the same you only need to add these entries to your robots.txt
:
User-agent: GPTBot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: GoogleOther
Disallow: /
User-agent: Google-Extended
Disallow: /
Updated: Added Google’s bot for experiments to block list. It may or may not be used for training Google Bard.
Updated: Added Google-Extended, which is explicitly Google’s AI training bot.