Fix: robots.txt Wildcard Pattern Blocking Too Many Pages
Wildcard patterns in robots.txt use * (match any sequence) and $ (match end of URL). A pattern like Disallow: /*/page/ intended to block pagination may also block /products/featured-page/overview/ if the URL structure matches the pattern.
The Problem
Robots.txt pattern matching is literal string matching with wildcards — not regex. Many developers write patterns assuming regex behaviour. Disallow: /*.pdf$ correctly blocks PDF files. But Disallow: /search* blocks any URL containing /search anywhere in the path, including /search-engine-marketing/ blog posts.
The Fix
# OVERLY BROAD — blocks /search-engine-tips/, /search-results/, etc: # Disallow: /search # CORRECT — only blocks /search and /search?q= style URLs: Disallow: /search$ Disallow: /search? # OVERLY BROAD — blocks all URLs with ? anywhere: # Disallow: /*? # CORRECT — only block specific query parameters: Disallow: /?sort= Disallow: /?filter= Disallow: /?ref=
Use the ConfigClarity robots.txt Validator's URL Tester to test your patterns against specific URLs before deploying. The tester shows BLOCKED or ALLOWED for any URL against any bot, using the exact same parsing logic that Googlebot uses.
Validate your robots.txt live — fetch any URL and get a corrected file in one click.
Open robots.txt Validator →