Googles John Mueller has bestowed a golden nugget of knowledge to the world web development community; speaking in a recent Google Search Central SEO hangout, Mueller provided an eye-opening insight into the way Googles crawlers detect and deal with duplicate content.
Specifically, he spoke of how a predictive method is used to check URLs for patterns, which resulted in pages with similar URLs being flagged as duplicates.
This is one of many measures adopted by Google to reduce the number of duplicate pages being crawled and indexed – those interpreted as duplicates due to their URLs alone are subsequently passed by.
Consequently, this could mean that countless webmasters and online businesses face the risk of the unique pages being overlooked by Google if their URLs are too similar.
Irrespective of the quality and uniqueness of the content, pages with URLs too similar to one another could automatically be interpreted as duplicates.
As explained by Mueller.
“Having multiple levels of trying to understand when there is duplicate content on a site is difficult, when we look at the pages content directly, we see, this page has this content, this page has different content, we should treat them as separate pages.”
He then went on to justify why Google works in this way, explaining that it really is not in the best interests of its search engine to index pages suspected to be duplicates.
“Even without looking at the individual URLs, we can save ourselves some crawling and indexing by just focusing on these assumed or very likely duplication cases, I’ve seen this happen with things like cities.”
“I have seen this happen with automobiles; essentially our systems recognize that what you specify as a city name is something that is not so relevant for the actual URLs. Usually we learn that kind of pattern we want when a site provides a lot of the same content with alternate names.”
Mueller went on to offer a few helpful tips to those who may be struggling with issues related to duplicate (or suspected duplicate) affecting their SEO performance.
“What I would do in a case like this is see if you have strong overlaps of content, and then minimize it”
“And that could be by using something like a rel canonical on the page and saying this small city that is right outside the big city, I’ll set the canonical to the big city because it shows exactly the same content.”
“So that really every URL that we crawl on your website and index is visible, this URL and its content are unique and it is important for us to keep all of these URLs indexed.”
“Or we see clear information that this URL is supposed to be the same as this other one, you have maybe set up a redirect or you have a rel canonical set up, we can just focus on those main URLs and still understand that the city aspect there is critical for your individual pages.”