Check your robot directives!
This has been written about more times than I know and I really never thought that I’d find myself writing about it. Yet, here I am. I feel compelled after recently having seen two sites with a robots directive that kept the search engines from indexing the site. This is a big problem, though easily solved. Disallowing the search engines from crawling your site though will ensure that you don’t rank and don’t get organic traffic. Neither of those are good. So it’s worth doing yourself a favor and checking!
What is a robots directive?
This tells the search engines what areas of a site can be crawled and what can’t be. They can be set up on page, from within the robots.txt file or both. If you have a WordPress site with an SEO plugin, there’s probably a robot.txt file that prevents some files from being crawled, like admin files, along with page level directives for web pages and posts. The robots.txt file can also point search engines to your xml sitemap. Think of the file as a welcome mat that points straight to the buffet.
How can you tell if your robots directives are setup right?
To check to see if you have a robots.txt file, type yourdomain.com/robots.txt in a web browser. If you get a 404 error (page not found), then you don’t have one. If it loads a pretty much blank page with some text, then you have one.
The first line, user-agent, identifies which search engine the commands are for. Typically, what’s good for one is good for another, and you can give all of the search engines the same directive through the use of an asterisk. The line looks like this:
The next line is the disallow line. This tells the search engine what not to crawl.
The following would block all search engines:
The following would allow search engines into every nook and cranny of your site:
Notice it doesn’t have the slash. That’s key.
Here’s one that just blocks a few directories and one that you’ll see on a WordPress site.
The last line that you might see is the sitemap line. It looks like this:
Here’s one that allows everything with the sitemap:
Another way to know is if you do a search for your company name and see something like this a result under your listing:
A description for this result is not available because of this site’s robots.txt – learn more.
If you check and you see something that blocks important directories of your site or a disallow that blocks the entire site (with the slash), then you’re probably not getting any organic search traffic although you might be getting some for searches for your company name.
Checking your on page robots directive means viewing the source code (different browsers have different ways of doing this) and checking to see if there is a line of code that looks like this:
<meta name=”robots” content=”index,follow” />
The easiest thing to do is just search for the word ‘robots’. That tells the search engine to index the page and follow all of the links on the page. If you see this:
<meta name=”robots” content=”noindex,nofollow” />
<meta name=”robots” content=”noindex,follow” />
<meta name=”robots” content=”index,nofollow” />
Then you could have misconfigured directives.
When should you keep a page or site from being indexed?
If you’re going through a redesign and your new site is accessible from a publicly facing URL like www.new-site.com or dev.domain.com then you’ll want to block that content from being indexed until you’re ready for it to go live.
Another time it may be helpful to block content is for PPC pages that are meant only for paid traffic and contains duplicate content. Setting this up wrong can drastically decrease your organic traffic. So if you aren’t sure, it’s probably best to ask an SEO expert.