Scanner Category

AI Crawler Access Analysis

See exactly which AI crawlers can — and can't — reach your content. We analyse your robots.txt configuration against 40+ known AI bots so you can make informed decisions about AI discoverability.

What It Does

Your robots.txt file is the first thing AI crawlers check before accessing your content. If it blocks them — intentionally or not — your content is invisible to AI search engines like ChatGPT, Perplexity, and Google AI Overviews.

GEO Lantern's AI Crawler Access analysis fetches your robots.txt and evaluates it against over 40 known AI crawler user-agents. We show you the exact access status for each major AI bot: allowed, blocked, or not specifically addressed.

This category accounts for 20% of your AI readiness score. Crucially, it's a binary gatekeeper — if AI crawlers are blocked, nothing else matters because they simply cannot see your content. That's why we make it one of the first things to check.

Major AI Crawlers

Bots We Check For

These are the major AI crawlers — GEO Lantern checks for over 40 in total.

GPTBot

Search & Training

OpenAI

Powers ChatGPT search and browsing features.

ChatGPT-User

Search

OpenAI

Used when ChatGPT users actively browse the web during conversations.

ClaudeBot

Search & Training

Anthropic

Crawls content for Claude's web search capabilities.

PerplexityBot

Search

Perplexity AI

Fetches content for Perplexity's real-time search answers.

Bytespider

Search & Training

ByteDance

Powers TikTok search and ByteDance AI products.

Google-Extended

Training

Google

Controls whether your content is used for Gemini and AI training (separate from Googlebot).

Applebot-Extended

Training

Apple

Controls content usage for Apple Intelligence features.

cohere-ai

Training

Cohere

Crawls for Cohere's enterprise AI products and search.

Step by Step

How It Works

We check your robots.txt against every known AI crawler.

1

Fetch your robots.txt

GEO Lantern retrieves your robots.txt file from the standard location at your domain root.

2

Parse all directives

We parse every User-agent block, Allow/Disallow rule, and any experimental directives like content-usage or disallow-ai-training.

3

Check 40+ AI crawlers

Each known AI crawler is evaluated against your rules to determine whether it is allowed, blocked, or has no specific directive.

4

Report and recommend

You receive a clear breakdown showing the access status of each major AI crawler, with recommendations based on your visibility goals.

FAQ

Frequently Asked Questions

What are AI crawlers?

AI crawlers are automated bots operated by AI companies to fetch web content. Unlike traditional search engine crawlers (like Googlebot), AI crawlers gather content specifically for AI-powered features — search answers, chatbot responses, and AI model training. Major AI crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity AI), and Google-Extended (Google).

How do I control which AI crawlers can access my site?

You control AI crawler access through your robots.txt file. Each AI crawler has a specific User-agent name. You can allow or block individual crawlers by adding rules like "User-agent: GPTBot" followed by "Allow: /" or "Disallow: /". This lets you grant access to search-tier crawlers while blocking training-tier crawlers if you prefer.

Should I block or allow AI crawlers?

It depends on your goals. If you want your content to appear in AI-powered search results (ChatGPT, Perplexity, Google AI Overviews), you need to allow the relevant crawlers. Blocking all AI crawlers means your content will not be referenced by these systems. Many site owners choose to allow search-tier crawlers while blocking training-only crawlers.

What is the difference between search-tier and training-tier crawlers?

Search-tier crawlers fetch content to provide real-time answers in AI search products — when someone asks a question and the AI retrieves your page to formulate a response. Training-tier crawlers gather content to train or fine-tune AI models. Some crawlers like GPTBot operate in both tiers. Google-Extended is purely training-tier and is separate from the main Googlebot search crawler.

How many AI crawlers does GEO Lantern check for?

GEO Lantern checks for over 40 known AI crawler user-agents. This includes major crawlers from OpenAI, Anthropic, Google, Apple, Meta, Perplexity, ByteDance, Cohere, and others. We regularly update our crawler database as new AI bots are deployed.

What if my robots.txt doesn't mention AI crawlers at all?

If your robots.txt has no specific rules for AI crawlers, their access depends on your wildcard rules. If you have "User-agent: * / Allow: /", all AI crawlers can access your site. If you have "User-agent: * / Disallow: /", all crawlers (including AI) are blocked. GEO Lantern analyses your complete robots.txt to determine the effective access for each AI bot.

Ready to See Your Score?

Run a free AI readiness scan and discover exactly how AI search engines perceive your website.