Semalt Expert: A Guide To Preventing Google From Crawling Old Sitemaps

As your website grows, you will try your best to find ways to improve its visibility and credibility on the internet. Sometimes, the effects of how our sites used to work remain behind, and this is where we should pay attention to.

Get down to the following tips from Max Bell, the Customer Success Manager of Semalt, in order to prevent Google from crawling old sites.

A few weeks ago, one of my clients told me that he had an e-commerce website. It went through various changes: from the URL structure to the sitemap, everything was modified to make the site more visible.

The customer noticed some changes in his Google Search Console and found the Crawl errors there. What he observed that there was a large number of old and new URLs that were generating fake traffic. Some of them, however, were showing Access Denied 403 and Not Found 404 errors.

My customer told me that the biggest problem he had was an old sitemap that existed in the root folder. His website used a variety of Google XML Sitemaps plugins previously, but now he depended on WordPress SEO by Yoast for the sitemap. Various old sitemap plugins, however, created a mess for him. They were present in the root folder named as sitemap.xml.gz. Since he started using the Yoast plugins for creating sitemaps for all posts, pages categories and tags, he did not need those plugins anymore. Unfortunately, the person did not submit the sitemap.xml.gz to the Google Search Console. He had only submitted his Yoast sitemap, and Google was crawling his old sitemaps too.

What to crawl?

The person did not delete the old sitemap from the root folder, so that was also getting indexed. I got back to him and explained that a sitemap is only a suggestion of what should be crawled in the search engine results. You probably think that deleting the old sitemaps will stop Google from crawling the defunct URL, but that’s not true. My experience says that Google attempts to index every old URL several times a day, making sure that the 404 errors are real and not an accident.

Googlebot can store the memory of old and new links that it will find in the sitemap of your site. It visits your website at regular intervals, making sure that every page is indexed correctly. Googlebot tries to evaluate if the links are valid or invalid so that the visitors don’t experience any problem.

It’s obvious that the webmasters will be confused when the number of their Crawl Errors increases. All of them want to decrease it to a great extent. How to inform Google to disregard all old sitemaps? You can do so by killing all the unwanted, and odd sitemap crawls. Previously, the only way to make it possible was the .htaccess files. Thanks to WordPress for providing us with some plugins.

The WordPress websites have this file in their root folders. So, you just need to access the FTP and enable the hidden files in cPanel. Go to the File Manager option to edit this file as per your requirement. You should not forget that editing it wrongly can damage your site, so you should always backup all the data.

Once you have added the snippet to the file, all of the expired URLs will disappear from your Crawl Errors in no time. You should not forget that Google wants you to keep your site live, decreasing the chances of 404 errors.