What is Duplicate Content and How Does It Affect SEO?
Even very successful websites get stymied by duplicate content. Think of it this way: Every time you create three or four versions of one of your pages, you’re competing against yourself three or four times before the page even enters the competitive market of search engine results pages.
People often have misperceptions about duplicate content and its effect on SEO, backlinks, and traffic, but we’re here to provide answers.
Whether your site consists of large numbers of templated pages or you’re just beginning the initial phases of web development, read on to avoid mistakes that could cost you valuable organic traffic.
What is duplicate content?
Strictly speaking, duplicate content refers to similar or exactly duplicated content that’s available on multiple locations on or off your site.
From a broader perspective, duplicate content refers to content that offers little value to visitors or pages that contain little body content.
A ratio of more than 3 duplicate content pages for every normal page is considered excessive and likely weighing down your SEO performance.
Why is duplicate content bad for SEO?
Duplicate content presents several issues primarily for search engines and site owners:
-
Search engines don’t know which versions to include or exclude from their indices, which means that it’s difficult for them to rank search queries in results. This also creates issues when consolidating the link metrics (anchor text, link equity, authority, trust) to one page or separate pages.
-
For site owners, search engines will be forced to show just one version as the best result, which dilutes the visibility of each duplicate. Link equity can also be diluted when other sites have to choose between duplicates as well.
Does duplicate content receive a Google penalty?
Google tried to squash myths surrounding duplicate content when Susan Moska posted on the Google Webmaster blog in 2008:
Let’s put this to bed once and for all, folks: There’s no such thing as a “duplicate content penalty.” At least, not in the way most people mean when they say that.
You can help your fellow webmasters by not perpetuating the myth of duplicate content penalties!
However, when duplicate content is a result of intentionally copying someone else’s website, Google has something to say:
Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.
Ultimately, Google will be forced to choose one version of the content to show in search results.
How duplicate content happens and how to fix it
Duplicate content can originate from technical issues like incorrectly setting up the web server or website. But they can also derive from the content being copied and published in other places.
-
URL variations, such as click tracking and some analytics code, can cause duplicate content issues
-
HTTP vs. HTTPS versions can create duplicate content
-
WWW vs. non-WWW pages can create duplicates of each of those pages
-
Scraped content, particularly identical manufacturer’s descriptions for products on e-commerce websites can be identical in multiple locations.
-
Index pages such as index.html or index.php may make your homepage accessible via multiple URLs
You can essentially fix all duplicate content issues by verifying which of the duplicates is the intended version.
Whenever content on a site can be found at multiple URLs, it should be canonicalized for search engines.
Here are three main ways to do this:
Set up a 301 redirect
A 301 redirect is a permanent redirect from one URL to another.
These redirects also link various URLs under one umbrella so search engines rank all of the addresses based on the domain authority from inbound links.
These types of redirects associate common web conventions (http:// or www) with one URL to maximize domain authority.
Use the rel=canonical attribute
The rel=”canonical” attribute is part of the HTML head of a web page and should be added to the HTML head of each duplicate version of a page.
Its purpose is to tell search engines that a specific page should be treated as though it were a copy of a specified URL, and all of the links, content metrics, and ranking power should be credited to the one specified URL.
Set the preferred domain of your site
The Google Search Console allows you to set the preferred domain of your site and to clarify whether Google should crawl a number of URL parameters differently (this is also called parameter handling).
The only limitation in using Google Search Console is that any rules or changes may not affect Bing or any other search engine’s crawlers.
Check out more about duplicate content
Learn more about duplicate content by checking out these resources:
-
Google: Duplicate content
-
Search Engine Land: The myth of the duplicate content penalty
Want more insights? Contact our digital marketing experts at RLC Media to start growing your online business today.