OFFICIAL: Google Cancels Support For NoIndex Directive In Robots.txt
Officially announced – GoogleBot will no longer be obeying the NoIndex directive inside Robots.txt files, as of September 1st 2019.
What does this mean for publishers?
As of September 1st, publishers have an opportunity to amend the way in which they instruct Google to crawl and index their site, omitting pages that they do not desire to be indexed in the Google search engine.
Speaking honestly, the NoIndex directive had never actually officially been recognised by Google, however, historically Googlebot had mostly recognised and acknowledged the directive – this will no longer be the case.
Unlike disallowed pages, noindex pages typically do not end up in Google’s search index and results.
Typically, it was noted that Google approximately complied with the NoIndex directive in 11 out of 12 cases for the past 10 years (as noted by Matt Cutts back in 2008) and that is no longer going to be reliable for publishers to lean on.
A standard approach of adding disallow and noindex directives to your sites robot.txt file were an effective way to prevent certain pages from appearing in Google search results:
“Today we’re saying goodbye to undocumented and unsupported rules in robots.txt
If you were relying on these rules, learn about your options in our blog post.
In the interest of maintaining a healthy ecosystem and preparing for potential future open source releases, we’re retiring all code that handles unsupported and unpublished rules (such as noindex) on September 1, 2019. ” ~ Google Official Tweet regarding Robots Exclusion Protocol (REP) July 1st
It has been explained to us by Gary Illyes, that Google Google found “the number of sites that were hurting themselves was very high.” He also highlighted that the update “is for the better of the ecosystem and those who used it correctly will find better ways to achieve the same thing.”
What’s The Difference Between NoIndex and NoFollow Meta Tags?
By default, pages are normally presented as “index” to search engine spiders who crawl sites to include pages within their search engines.
The noindex directive is an often used value in a meta tag that can be added to the HTML source code of a webpage to suggest to search engines (most notably Google) to not include that particular page in its list of search results.
Pages such as Membership content pages, Thank You pages and pages of no relevance to search engines are typically marked up as “noindex”, as they serve no value in being indexed and presented in search engine results listings.
The nofollow directive is an often used value in a meta tag that can be added to the HTML source code of a webpage to suggest to search engines (most notably Google) not to pass link equity through any links on a given webpage.
External and internal links are of course an integral and important part to website optimisation in search engine results.
By utilising the ‘nofollow‘ directive, we’re instructing website spiders to follow and index links to pages, while not passing across the link equity to the target page. Adding nofollow directives to blog comments and other such commenting tools, has gained popularity in a bid to discourage spammers and black hat SEO tactics of spamming comments across the web in a bid to pass link juice and authority from a higher authority domain to another website.
5 Methods: How To Control GoogleBot Crawling/Indexing Moving Forward
If you are currently reliant on using the NoIndex directive in your Robots.txt file, Google recommend 5 alternative options for handling NoIndex outcomes moving forward after Sep 1st. These are:
- Noindex in robots meta tags
- 404 and 410 HTTP status codes
- Password protection
- Disallow in robots.txt
- Search Console Remove URL tool
Noindex robots meta tags: By far and away, this method is the most effective way to remove URLs from the index while still allowing Googlebot to crawl the intended page/s. Noindex robots meta tags are supported in both the HTTP response headers and HTML and are achieved by adding a meta robots noindex directive on the web page itself.
404 and 410 HTTP status codes: Application of these status codes is a recognised standard that are used to inform search engines that a page no longer exists, which will lead to them being dropped from the index, once they have been crawled.
Password protection: Another typical method to prevent GoogleBot from accessing a page is by hiding it behind a login will generally lead to it being removed from the index.
Disallow in robots.txt: A less reliable, yet popular method is to use the disallow directive – a method such as this of blocking a page from being crawled will in most cases prevent pages from being indexed, as search engines are only able to index the pages they know about. While a page may be indexed due to links pointing to it from other pages, Google will aim to make the page less visible in search results.
Search Console Remove URL tool: The URL removal tool within Google Search Console is a quick and easy way to temporarily remove a URL from Google’s search results.
Other Robots.txt Changes To Be Confirmed
Reviewing the Google Developers Blog, the above exclusion of the noindex directive is just one of a number of changes being brought about in an effort to create an internet standard for the robots.txt protocol.
Read the official Google announcement here:
Read the official Google tweet here
MagnetMonster is a digital marketing agency with over three decades of combined experience in the industry. Our services have contributed towards building multimillion £/$ e-commerce brands with proven email marketing strategies. MagnetMonster is fiercely passionate about helping SMEs optimise their customer value journeys with effective content marketing and scaleable digital strategies.