Cloudflare has introduced a new free tool to prevent AI bots from scraping its clients' websites for content to train large language models.
This tool is available to all Cloudflare customers, including those on free plans, and will be updated over time to counter new bot fingerprints.
Key Takeaways
- Cloudflare's new tool blocks AI bots from scraping website content.
- The tool is available to all Cloudflare customers, including free plan users.
- 85.2% of Cloudflare's customers have chosen to block AI bots.
- The most active bots include Bytespider, GPTBot, Amazonbot, and ClaudeBot.
- Cloudflare's machine learning models help detect and block evasive bots.
Cloudflare's New Tool Against AI Bots
Cloudflare has launched a new tool designed to block AI bots from scraping website content. This tool is available to all Cloudflare customers, including those on free plans. The company has committed to updating the tool over time to counter new bot fingerprints as they are identified.
Customer Response and Bot Activity
According to Cloudflare's internal data, 85.2% of its customers have opted to block AI bots, even those that properly identify themselves. The most active bots over the past year include Bytespider, which attempted to access 40% of websites under Cloudflare's purview, and GPTBot, which tried on 35%. Amazonbot and ClaudeBot also ranked among the top four AI bot crawlers by the number of requests on Cloudflare's network.
Challenges in Blocking AI Bots
Blocking AI bots entirely is proving to be a significant challenge. The race to build models faster has led some companies to skirt or outright break existing rules around blocking scrapers. For instance, Perplexity AI was recently accused of scraping websites without the required permissions. Cloudflare's large-scale backend operations and machine learning models are crucial in detecting and blocking these evasive bots.
Machine Learning and Bot Detection
Cloudflare's machine learning models play a vital role in identifying and blocking AI bots. These models analyse traffic patterns and can detect bots that attempt to mimic real user behaviour. The company uses digital fingerprinting to track and block these bots, ensuring that content creators can maintain control over their content.
Future Updates and Continuous Monitoring
Cloudflare has pledged to continue monitoring bot activity and updating its tool to block new bot fingerprints. The company aims to help content creators thrive on the internet while maintaining full control over how their content is used for AI training or inference.
Conclusion
Cloudflare's new tool represents a significant step in the fight against AI bots scraping website content. By leveraging machine learning and continuous updates, Cloudflare aims to provide a robust defence for content creators, ensuring their work is protected from unauthorised use in AI model training.
Sources
- Cloudflare is taking a stand against AI website scrapers, Engadget.
- Cloudflare offers 1-click block against web-scraping AI bots • The Register, The Register.
- Gratis tool van Cloudflare gaat scraping door AI-bots tegen - IT Pro - Nieuws - Tweakers, Tweakers.
- OECD AI Policy Observatory Portal, OECD AI Policy Observatory.