Diffbot

What is Diffbot ?

Diffbot is a commercial web crawler and data extraction system designed to transform unstructured web content into structured data. It is used to build machine-readable knowledge graphs and APIs for search, ecommerce, competitive intelligence, and AI training. The bot parses websites, extracts entities, and stores structured information, often without requiring explicit markup. Its infrastructure has been active for over a decade and operates at scale.

Who is operating Diffbot ?

Diffbot is operated by Diffbot Technologies Corp., a company headquartered in California. It provides automated knowledge extraction as a service. Clients include search engine companies, enterprise data platforms, and research labs. Its Knowledge Graph is built entirely from web crawls and powers a variety of downstream products. Public documentation is available at https://www.diffbot.com/bot/.

Why you should be interested in Diffbot ?

Diffbot crawls entire websites, not just individual pages, and stores the extracted data in commercial databases. If you publish product listings, company info, or structured content, Diffbot likely collects and monetizes it. This can lead to indirect competition, loss of content control, or untracked reuse. Its crawlers have been flagged for persistent deep scraping across e-commerce and directory-type sites.

How to block Diffbot ?

1. robots.txt File:
Add the following rule to your robots.txt file

# block Diffbot

User-agent: Diffbot
Disallow: /

2. Subnet Filtering:
Diffbot publishes IP ranges at https://www.diffbot.com/bot/. Block those at the network level for stronger enforcement.

3. Behavior Profiling:
Diffbot uses high-frequency structured requests. Set traps or monitor logs to detect unexpected scraping patterns.

About the bot

Owner: Diffbot Technologies Corp.
Owner URL: diffbot.com
Bot URL: diffbot.com/bot/
Bot User Agent: Mozilla/5.0 (compatible; Diffbot/3.0; +https://www.diffbot.com/bot/)
Respects robots.txt: Yes

Ready to understand your AI-driven traffic?

Join thousands of websites that use PeripL to track and optimize for AI platforms.

Try our beta

We currently support WordPress and PrestaShop 1.6 exclusively. Support for additional platforms will be available soon.