What is cohere-ai ?
The “cohere-ai” user-agent is associated with Cohere, a company focused on building large language models and AI infrastructure. This crawler has been observed accessing publicly available websites, likely to collect data for the purpose of training Cohere’s proprietary models. While the company provides minimal official documentation for this bot, it has been referenced by several bot-tracking platforms such as DataDome and acknowledged in web server logs globally.
Who is operating cohere-ai ?
Cohere Inc. operates the cohere-ai crawler. Based in Toronto, Canada, Cohere specializes in natural language processing technologies, offering APIs for text generation, classification, and retrieval-augmented generation. The company’s website outlines its focus on enterprise-grade AI systems and foundational models.
Why you should be interested in cohere-ai ?
For webmasters, cohere-ai matters because it’s part of the wave of crawlers designed to gather data for LLMs. This includes your content being ingested into datasets that could influence how commercial models respond or behave. If you prefer your website not be used for such training purposes, it’s critical to understand and control this bot’s access. The crawler has been reported to sometimes ignore robots.txt, which implies manual mitigation may be necessary.
How to block cohere-ai ?
1. Robots.txt File:
Add the following rule to your robots.txt file
# block Amazonbot User-agent: cohere-ai Disallow: /
2. User-Agent Filtering:
Block “cohere-ai” at the web server level using .htaccess, Nginx directives, or equivalent configuration.
3. Monitoring and Firewall:
Inspect your logs for user-agent patterns matching “cohere-ai” and apply IP-based blocks as required. DataDome and similar platforms report the bot operates across a range of hosting providers.
About the bot
Owner: Cohere Inc.
Owner URL: cohere.com
Bot URL: cohere.com
Bot User Agent: Mozilla/5.0 (compatible; cohere-ai; +https://cohere.com)
Respects robots.txt: No