When It Happens and How to Avoid It
Websites are most vulnerable to this traffic killer just after the release of a new design. That’s because the web designer will want to please the client by showing progress and getting feedback on different iterations.
Often, the web designer will create a subdomain for the new website like: newdesign.example.com. This creates a bit of an SEO problem. newdesign.example.com may get indexed by search engines, and this creates duplicate content which isn’t any good for SEO.
So, if the web designer is savvy, he’ll block access for the newdesign.example.com – by adding a robots.txt file. This is a two minute job, and will prevent search engines from accessing the new sub domain.
It’s a regular plain text file and will look like this:
So far, so good.
Google’s web crawler is known as Googlebot and its job is to discover and index pages. It’s known as a user-agent. Before it can visit any webpage, it must visit the robots.txt file to learn what areas it can and can’t index. It follows these instruction to the letter.
In User-agent: *, the * acts as a wildcard which means the following rule below it applies to all user-agents (including Googlebot).
In this case, the forward slash in, Disallow: / indicates that all the content on the new subdomain should not be crawled or indexed.
Now for the Little Mistake that has Big Consequences
Typically when new designs get signed-off, they are often behind schedule. So it’s usually a rush getting the new design live onto the main website (e.g. example.com).
The designer will then copy all the files from the development subdomain (e.g. newdesign.example.com), and typically this includes the robots.txt file:
If the robots.txt file remains unchanged and goes live on the main site, it’s like traffic workman holding up a big red stop sign. During this time no (SEO) traffic will be allowed to go through. The stop sign only changes to green and welcomes Google back when the robots.txt returns to normal and the forward slash is removed, like this:
Really easy to fix but a really easy mistake to make too, wouldn’t you say?
this happens 5 to 10% of the time – but usually isn’t a big problem because it’s discovered within a day or so.
Why don’t you check your site right now? The robots.txt file always lives at the root of your website,
And always remember, if you have a new website and you’ve used a designer or developer to help you, check the robots.txt file when the new design goes live.