Fix: robots.txt Wildcard Pattern Blocking Too Many Pages

Wildcard patterns in robots.txt use * (match any sequence) and $ (match end of URL). A pattern like Disallow: /*/page/ intended to block pagination may also block /products/featured-page/overview/ if the URL structure matches the pattern.

The Problem

Robots.txt pattern matching is literal string matching with wildcards — not regex. Many developers write patterns assuming regex behaviour. Disallow: /*.pdf$ correctly blocks PDF files. But Disallow: /search* blocks any URL containing /search anywhere in the path, including /search-engine-marketing/ blog posts.

The Fix

Test your patterns before deploying

# OVERLY BROAD — blocks /search-engine-tips/, /search-results/, etc:
# Disallow: /search

# CORRECT — only blocks /search and /search?q= style URLs:
Disallow: /search$
Disallow: /search?

# OVERLY BROAD — blocks all URLs with ? anywhere:
# Disallow: /*?

# CORRECT — only block specific query parameters:
Disallow: /?sort=
Disallow: /?filter=
Disallow: /?ref=

Use the ConfigClarity robots.txt Validator's URL Tester to test your patterns against specific URLs before deploying. The tester shows BLOCKED or ALLOWED for any URL against any bot, using the exact same parsing logic that Googlebot uses.

Validate your robots.txt live — fetch any URL and get a corrected file in one click.

Open robots.txt Validator →

Frequently Asked Questions

Does robots.txt support full regex?

No. robots.txt only supports two wildcards: * (match any sequence of characters) and $ (match end of URL). Full regex like character classes [a-z], alternatives (foo|bar), or quantifiers {2,5} are not supported. Test your patterns carefully.

How do I test if a robots.txt pattern is blocking a specific URL?

Use ConfigClarity's robots.txt Validator — paste your robots.txt and use the URL Tester to check any path against any bot. Google Search Console's URL Inspection also shows 'Blocked by robots.txt' for specific URLs.

What is the difference between Disallow: /search and Disallow: /search$?

Disallow: /search blocks any URL starting with /search — including /search-results/, /search-engine-tips/, /search?q=. Disallow: /search$ only blocks the exact URL /search with nothing after it. Use $ to anchor the end of the pattern.

Fix: robots.txt Wildcard Pattern Blocking Too Many Pages

The Problem

The Fix

Frequently Asked Questions

Related Guides