What is Google-Extended ?
Google-Extended is not a crawler but a user-agent token added to requests made by Googlebot and related agents. Its role is to let site owners decide whether their content can be reused to train Google’s generative AI models (like Bard or Gemini). It does not trigger new traffic; instead, it serves as a signal in the robots.txt file. Details are documented at https://developers.google.com/search/help/google-extended.
Who is operating Google-Extended ?
Google-Extended is operated by Google LLC. It is part of Google’s infrastructure for data governance, specifically tied to large model training. The mechanism is public and implemented via robots.txt, but enforcement depends on Google’s internal compliance systems.
Why you should be interested in Google-Extended?
This user-agent matters because it decouples search indexing from AI model ingestion. You may want to appear in search, but not feed Google’s training datasets. If you don’t block it, Google assumes consent to reuse your content in models. That has legal, editorial, and competitive implications—especially if you produce high-value or original material.
How to block Google-Extended?
1. robots.txt directive:
Add the following rule to your robots.txt file
# block Google-Extended User-agent: Google-Extended Disallow: /
2. Optional:
To block both indexing and reuse, add:
User-agent: Googlebot
Disallow: /
3. Keep robots.txt clean:
Google-Extended must be declared explicitly. It doesn’t inherit rules from other agents.
About the bot
Owner: Google LLC
Owner URL: google.com
Bot URL: developers.google.com
Bot User Agent: Google-Extended
Respects robots.txt: Yes