Let us start this article with a much needed clarification: Google does not have a duplicate content penalty. It does not actively ban or down-rank websites due to this issue. However, it discourages this practice. There is a difference of meaning, but it still means that having duplicate content is not a good idea. Thus, today we will show you how to solve duplicate content issues.
What Is Duplicate Content?
In the understanding of search engines, duplicate content is any piece of text that appears on two different URLs. This is what gives a page its unique character: the URL address. Any variation, no matter how small, between URL address formats is interpreted as two different URLs by Google.
And this is one of the key reasons why you may have duplicate content floating on the internet without being aware of it. This is why we will start by explaining how the problem occurs before we show you how to solve duplicate content problems.
How and Why Do You Get Duplicate Content?
Your website may generate different URLs with identical content in many ways that have nothing to do with a conscious decision to try and trick Google or use black hat SEO tactics. Here are some of the most frequent situations that generate unintentional duplicate content:
1. URL Structure Variations
When users type in your website URL, they may put WWW before the domain name or not. Yet, they reach the same page. That seems very convenient, and this is why many webmasters do not set a preference for www or non-www URL format.
However, this means that Google sees two pages: www.domain.com/blog/article and domain.com/blog/article. And both of them have identical content. This is not what you intended and you certainly don’t want two versions of your URL competing for ranking.
2. Switching from HTTP to HTTPS
Using a secure HTTP address is a good thing – it gives visitors confidence, especially if you are running an ecommerce website. However, it is possible that you started out with basic HTTP and implemented HTTPS later on.
After the transition, it is possible that many of your website pages are still available online with the HTTP format.
3. Session ID Pages
When people put products in a shopping cart but leave the website without completing the purchase, you want them to find the cart with the products the next time they visit your site. In order to do so, you have to create a session ID that preserves the information they created (i.e. the products present in the shopping cart),
While this type of session is active, the product pages exist at the same time as a website URL and as a session ID URL.
How to Solve Duplicate Content Problems
Here are some effective ways of dealing with different URLs containing identical content:
1. 301 Redirects
This option works for HTTP to HTTPS transitions and other instances of website redesign. By adding the 301 redirect instruction in the website code, when a user types in or clicks on the old URL, they are redirected to the new one.
The old URL no longer exists, however the content is still available in the new URL format.
2. Use Hreflang Tag for International Pages
Companies that serve multiple territories may have duplicate content on pages for each specific country. The differences are slight, such as different currency for prices and American English spelling. However, Google still recognises these pages as having duplicate content.
In order to solve the duplicate content issue in this instance, you need to use a hreflang tag for each URL dedicated to a specific region. This will also help Google display the correct page in search results to users from various countries.
3. Use Canonical Tag to Separate Main Page from Printer Friendly Version
When you offer users helpful content such as instructions for use, tutorial, cheat sheets, etc., you also create a printer friendly version of the page that contains only the text (no graphics, menu bar, photos).
This is really helpful for people who want to print your articles without wasting printer ink. But it means that Google finds a different URL with the same content when its bots crawl your site. To solve this, implement a canonical tag to indicate the preferred URL for indexing.
4. Deal with Rogue Duplicate Content
In some instances, in order to solve duplicate content issues, you need to take legal action. It is the case when websites scrape your content, that is, copy and paste it on their pages. This is called copyright infringement and Google does not condone it at all.
To solve this specific and, sadly, widespread issue, check out the steps for reporting copyright infringement to Google at this resource.
5. Don’t Use Boilerplate Content
Boilerplate content is basically a product description provided by the manufacturer. It is used by all resellers/ distributors, thus creating a massive amount of duplicate content.
The best way of dealing with this issue is keeping the relevant specifications and creating your own unique product descriptions. A little creative work will take you a long way.
Last but not least, don’t forget that content syndication, although helpful to reach new audiences, also creates duplicate content. In this instance, your number one worry is that the syndicated link has more authority than your own domain. Thus, your own content will be displayed preferentially on a different website. And this can become a problem, i.e., losing potential organic traffic, so you should solve this duplicate content issue if you see a drop in your monthly traffic numbers.