AI2Bot-Dolma

What is AI2Bot-Dolma ?

AI2Bot-Dolma is a web crawler operated by the Allen Institute for AI (AI2), specifically developed for the construction of the Dolma dataset—an open corpus used to train models such as OLMo. According to AI2’s official documentation, this bot indexes publicly accessible web data, focusing on ethically sourced and transparently documented content. It is identifiable by the user-agent “AI2Bot-Dolma”.

Who is operating AI2Bot-Dolma ?

The operator of this bot is the Allen Institute for AI, a non-profit research institute founded by Paul Allen. AI2 is known for initiatives like Semantic Scholar, OLMo, and PRIOR. More information about its mission and governance can be found on its official site: https://allenai.org.

Why you should be interested in AI2Bot-Dolma ?

If you operate a website, you may see traffic from AI2Bot-Dolma. Its purpose is non-commercial and focused on research, but its crawling activity can still impact bandwidth, analytics accuracy, or content exposure. AI2 publicly discloses its crawling policies and adheres to standard robots.txt rules. For webmasters who wish to avoid their content being used in training open-source language models, clear blocking mechanisms are available.

How to block AI2Bot-Dolma ?

1. Robots.txt File:
Add the following rule to your robots.txt file

# block AI2Bot-Dolma

User-agent: AI2Bot-Dolma
Disallow: /

2. Server Configuration:
You may configure your web server (e.g., Apache or Nginx) to deny requests with the user-agent string “AI2Bot-Dolma”.

3. Firewall Rules:
Though AI2Bot-Dolma does not publicly list its IPs, administrators may monitor logs to identify and block specific IPs if required.

About the bot

Owner: Allen Institute for AI
Owner URL: allenai.org
Bot URL: allenai.org/crawler
Bot User Agent: Mozilla/5.0 (compatible; AI2Bot-Dolma; +https://allenai.org/crawler)
Respects robots.txt: Yes

Ready to understand your AI-driven traffic?

Join thousands of websites that use PeripL to track and optimize for AI platforms.

Try our beta

We currently support WordPress and PrestaShop 1.6 exclusively. Support for additional platforms will be available soon.