WordPress generates a virtual robots.txt file automatically, you won’t find a physical file on disk, but visiting yourdomain.com/robots.txt returns content that WordPress builds on the fly. The default output is minimal: it disallows /wp-admin/ and allows /wp-admin/admin-ajax.php. That’s it. For most production sites, this isn’t enough, search result pages, paginated archives, and RSS feeds can all be crawled and potentially indexed, wasting crawl budget and creating thin content issues.
This snippet uses WordPress’s built-in robots_txt filter to replace the default output with a more complete rule set.
The Code
Add this to your functions.php or a site-specific plugin. The filter receives two arguments: $output (the current robots.txt content) and $public (a boolean reflecting whether search engines are discouraged in Settings → Reading).
What Each Rule Does
The Disallow: /wp-admin/ and Allow: /wp-admin/admin-ajax.php pair is standard, it blocks bots from the admin area while allowing the AJAX endpoint that some front-end features depend on.
Disallow: /wp-includes/ prevents bots from crawling WordPress core files. While these files aren’t typically indexed, preventing crawl requests saves crawl budget for your actual content pages.
Disallow: /?s= blocks WordPress’s default search results URL format. Search results pages are low-quality, duplicate-prone pages that offer no SEO value and can actively harm your crawl budget and site quality signals if indexed. Disallow: /search/ covers the pretty permalink version of the same.
Disallow: /page/ blocks paginated archive pages. This is a matter of preference, some SEO strategies prefer keeping paginated archives crawlable, but on smaller sites where most content is accessible from the first page of an archive, blocking pagination reduces indexable thin content.
Disallow: /feed/ prevents RSS and Atom feed URLs from being crawled. Feed content is duplicate by definition, it mirrors post content already available at canonical URLs.
The Sitemap: directive tells any compliant bot where to find your XML sitemap. This is especially useful for bots that don’t rely solely on Google Search Console for sitemap discovery.
The Public Check
The if ( ! $public ) return $output; line is important. When a site has the “Discourage search engines” option enabled in Settings → Reading, WordPress’s default robots.txt output includes a blanket Disallow: /. This snippet respects that by returning the original output unchanged when the site is set to discourage crawlers, ensuring your staging environments stay blocked even with this snippet active.
Testing Your robots.txt
After adding the snippet, visit yourdomain.com/robots.txt directly to confirm the output. Google Search Console also has a robots.txt tester under Settings that lets you verify specific URLs against your rules. Note that changes to robots.txt take effect immediately for new crawls, but previously cached versions may persist in Google’s systems for a short time.
add_filter( 'robots_txt', function( $output, $public ) {
if ( ! $public ) return $output; // Respect the 'Discourage search engines' setting
$site_url = get_bloginfo( 'url' );
$output = "User-agent: *\n";
$output .= "Disallow: /wp-admin/\n";
$output .= "Allow: /wp-admin/admin-ajax.php\n";
$output .= "Disallow: /wp-includes/\n";
$output .= "Disallow: /?s=\n"; // Block search results
$output .= "Disallow: /search/\n"; // Block pretty search URLs
$output .= "Disallow: /page/\n"; // Block paginated archives
$output .= "Disallow: /feed/\n"; // Block RSS feeds from indexing
$output .= "\n";
$output .= "Sitemap: {$site_url}/sitemap.xml\n";
return $output;
}, 10, 2 );
