Duplicate content means having the same or similar content on multiple pages.
But you can deal with duplicate content issues by:
- Rewriting and merging duplicated content,
- Creating a consistent URL structure, and
- Adding canonical tags and redirects where appropriate.
These technical solutions to optimizing duplicate content may sound complex. But let me show you how to apply them without breaking a sweat.
How to optimize duplicate content
Optimizing duplicate content may be a real struggle. Especially if you manage a large website with thousands or millions of pages.
Let’s work this problem out step-by-step!
Psst. Do you wonder, e.g., how to find duplicate content issues on your website first? Head straight to our Q&A corner at the end of this article.
Use canonical tags to show search engines the page with the main version of given content. This is also the page that bots should index.
Take a look at how ikea.com uses a canonical tag on one of its pages:
Have you decided on the canonical version of your URLs? If you did, check your site’s internal links and make sure each link points to the correct URL version.
You can also contact Onely for internal linking optimization.
Add redirects from the non-canonical URLs to their canonical versions
Redirects help you merge ranking signals under one URL. If Google sees a redirect, it should remove the old version of a page from the index and transfer PageRank to the new version. Thanks to that, Google should only index the target page.
Use a 301 redirect as generally the best option for managing duplicate content.
However, when implementing redirects, remember to avoid redirect chains and loops. They may lead to the “Redirect error” issues in Google Search Console and waste your crawl budget.
Decide if bots should crawl your duplicate content
This decision depends on the type of duplicate content and how you intend to deal with it.
If you want to redirect your duplicate pages, remember that Google needs to be able to crawl these URLs. Otherwise, it won’t see the redirects and won’t respect them.
It’s also the case if you rewrote your duplicate or added a canonical tag to the main version of the content. Google will need to crawl the page to test its quality again.
But what if you have duplicate content that doesn’t provide value for your site and you can’t make changes to it? Then, block search engines’ access to it in your robots.txt file.
Is your page “Blocked by robots.txt” in Google Search Console?
Read my article to make sure that you blocked it from crawling on purpose.
Use a noindex tag
Add a noindex tag to pages that shouldn’t be indexable by search engines but should remain visible to users.
In particular, you should add a noindex tag to your internal search results pages. It’s because your internal search results may create different variations of your pages.
See how ikea.com handles it. Even if users generate identical or similar URLs, bots won’t index these pages. You can check if a given page uses a noindex tag using the Robots Exclusion Checker tool:
But if you see that an internal search page could answer the user intent, feel free to leave it indexable.
The thing you need to watch out for is using canonical and noindex tags together as they may send mixed signals to bots. Using both these tags on one page may end up with bots choosing a different canonical page than you intended.
Adjust your URL structure
The minor differences in URL structures make bots view your pages as separate URLs. And it’s fine when these pages contain unique content. But having two different URLs with the identical content is an easy way to create duplicates.
Pay attention to the aspects below to avoid inconsistent URL structures:
Wwws and non-wwws
You may have URLs on your site that users can access:
- without wwws like example.com, and
- through URLs that include wwws, like www.example.com.
Whether you add www or not, and whichever protocol you use, ensure it’s consistent.
Did you discover any URLs that don’t follow the selected pattern? Then, use 301 redirects for non-preferred ways that lead to the preferred version.
Lower-case and upper-case characters
Google treats URLs as case-sensitive. So, for Google, example.com/page and example.com/PAGE will be two different pages.
It is customary to use lower-case characters in URLs. So if you mix the lower and upper cases, you may create different URLs with the same content.
Choose the URL with the preferred casing and redirect the incorrect version to it.
Bots will view identical URLs with and without a trailing slash as different pages. It’d be the case with: example.com and example.com/.
Once again, ensure you stick to the same URL pattern and redirect the wrong pages if necessary.
Tracking or filtering parameters
Do you find filtering out products useful? I sure do! I use it almost every time when shopping online.
Sadly, it tends to generate mountains of URLs with the same or identical content. An example of this could be https://www.example.com/clothes/dresses?size=medium.
Using URL parameters for tracking purposes is another source of duplicate content. These may be UTM parameters to track visits from, e.g., Twitter or your newsletter. Here is an example: https://example.com/page?utm_source=twitter.
Add canonical tags from parameterized URLs to the pages without tracking parameters. This is how footlocker.com does it:
In the case of some websites, each URL requested by a visitor gets a session ID appended. However, from search engines’ perspective URLs with the session IDs are duplicates of pages without them.
What should you do is to canonicalize the URLs with appended session IDs to their URL equivalents without that additional information.
Having a print-only version of a page at a separate URL means there are two versions of the same content. Here is an example of allrecipes.com.
Take a look at how these URLs differ:
- https://www.allrecipes.com/recipe/21220/mashed-sweet-potatoes/ vs.
In such a case, add a canonical URL from the version that allows printing to the standard version of the page.
And this is what allrecipes.com did:
Optimize your content
Your valuable pages should rank and drive traffic. Ensure they contain unique, high-quality content that targets specific user intent.
Here are some content aspects to consider in your optimization:
Improve product pages
Do you copy the generic product description from the manufacturer? Well, that may not be the best strategy for your eCommerce website. It may lead to duplicate content issues across different domains.
Instead, include more information about your products or services, e.g., in a FAQ.
Create unique and relevant category pages
Search engines love unique content that addresses user intent! Browse through your categories and think if each is necessary.
Ask yourself two questions:
- How are these categories helpful for users?
- How do they fit in your website structure?
What you can do next? Consider removing some pages or combining them into one. Do the same for any filtering or sorting options available in the categories.
Do you have a few articles discussing closely related topics? Merge them into one larger piece of content that can be its most comprehensive version. It will also help you reduce cannibalization issues within your domain.
This way, you can create helpful content that provides all the information in one place. It helps you decrease the number of similar pages.
Also, it may be easier to rank a high-quality article than many average ones that target the same subject.
Create supplementary content
Supplementary content can make duplicate pages more unique and valuable. It also increases their chances of getting indexed and ranking well. Think of improving the user experience and what will help visitors the most.
In order to do that, browse the pages with little content and think if there is anything you can add.
Manage user-generated content
Unique, comprehensive content created by users can be beneficial for your site. Encourage customers to leave reviews and display them on your pages.
Also, prevent thin or duplicate user-generated content on your website. For example, set a characters limit a user needs to write to post a review or ad on your site.
Optimize serving international content
If you have the same content to target different regions but the same language, bots may consider these versions duplicates. Add hreflang tags to show Google which language and country you are trying to reach.
Sometimes Google may see the content as duplicate, even when hreflang attributes are in place. Then, it can fold two or more versions together. Usually, it may not be a severe issue, but it can still affect user experience.
Moreover, remember to create content suitable for the specific country you are targeting. Make an effort to localize your content, especially for strategic international markets.
Republishing your content on a different website, e.g., medium.com, may be a great way to expose your content to a broader audience. Google won’t treat syndicating as duplicate content if you indicate its main version, e.g., on your website. Thus, remember to add a canonical tag to the original source.
Also, ensure that other sites include a link to your original content and point to the correct URL.
Disable access to staging environments
Staging or testing environments contain a copy of the site available in production. Thus, they shouldn’t be crawlable or indexable to search engines.
Take a look at how many staging websites you can find on Google. And these are only the search results for a given site: command:
Can you imagine how much duplicate content it can generate? I know, it scares me too…
Prevent bots and users from accessing your staging site – use HTTP authentication.
Prevent duplicate content issues caused by CMS
As an example, WordPress automatically generates tag and category pages. Such pages can be a severe waste of crawlers’ resources. They may often be nearly empty and Google will often see them as near-duplicate.
You may also find that your CMS creates separate pages for images that don’t contain any other content.
How can you prevent such duplicate issues? Add noindex tags to unwanted pages or disable these features in your CMS.
Remove duplicate content
Change the status code of duplicate pages to 404 or 410 if:
- They serve no purpose for your visitors or your business, and
- You don’t plan to improve them.
Both status codes have the same long-term consequences. But note that 410 could remove pages from the index and limit their crawling quicker than 404.
How optimizing duplicate content can affect your website
Here are 3 main aspects of how managing duplicate content may help your website:
Reach your website’s ranking potential
Help search engines decide which page they should index and present in the first place.
If the same content exists on a few pages, many URLs may receive links from other domains. It can split the total link authority between the pages.
Save crawl budget and avoid indexing issues
You should always want to spend the crawl budget on crawling valuable content. Don’t let search engine bots waste some of their resources crawling the same content over and over.
You can make it easier for search engines to see which version of your pages you want to have indexed. Creating a sound indexing strategy will help you avoid self-induced indexing issues.
Moreover, it also prevents search engines from seeing your whole website as low-quality.
Here’s what you can do now:
- Contact us.
- Receive a personalized plan from us to deal with your duplicate content.
- Enjoy your unique content on the web!
Q&A on duplicate content
How to find duplicate content?
Use Copyscape to see which content from your pages appears across the web.
To find out about duplicate content issues on your site, use Siteliner. It uncovers how pages on your site match each other’s content.
How to use Google Search Console to find duplicate content issues?
Visit the Index Coverage (Page indexing) report. Then, head to the Not indexed list below the chart. The report can show you the following statuses on the duplicate content issues such as:
Google found duplicate URLs that aren’t canonicalized to the preferred version. Check which URL Google chose as canonical by navigating to the URL Inspection tool.
Fix this issue and select the canonical URL yourself.
Google ignored your specified canonical tag. Then, it selected a different canonical URL that it found more suitable.
- Duplicate, submitted URL not selected as canonical
You submitted URLs without a canonical URL. Google considers the submitted URLs duplicate, so it picked a different canonical.
Once again, add canonical tags to the preferred URL
What are the causes of duplicate content?
Most often, duplicate content appears because of:
- Poor web development, e.g., unoptimized CMS platform, and
- Faulty implementations on the site, e.g., wrong server configuration.
We can find duplicates on all types of sites, but huge websites are more prone to it.
Which content types are likely to generate duplicates?
Duplicate content may particularly concern:
- Blog sections
If you create many, e.g., comparison articles, you may end up with content that matches various article categories. As a result, many URLs of different categories can lead to the same article.
Speaking as a content creator, that’s something that I’d definitely try to avoid.
- User-generated content
It applies to any site that contains posts, ads, profile pages, etc., created by users. Often, users may use copied or spam text or only add a link to their website on the profile page.
- Listings from databases used by other domains
These are, e.g., marketplaces or real estate sites. As a result, identical ads or posts can appear across several domains.
- Tags that collect content on related topics
You can find them, e.g., on news sites. In some situations, pages can use many tags and appear in various locations on the site.
What elements of eCommerce websites are prone to become duplicate?
Duplicate content on eCommerce sites often applies to the following aspects:
- Product pages with little to no content. Also, it concerns using the manufacturer’s product descriptions across many pages.
- Category pages with filters that display lists of the same products on many pages.
If you have an eCommerce site, you know how such pages translate to business revenue. Thus, optimizing the pages should be your priority.
Can duplicate content lead to a Google penalty?
It can if it results from malicious activities.
Scraping content is an example of a manipulative practice related to duplicate content. It occurs when someone takes the content from your pages to republish it on their site.
You can use a safeguard to protect your content from such practices. Add self-referential canonical tags pointing to your existing pages.
Don’t let a Google penalty ruin your online presence. Take advantage of our Google penalty recovery services to get your website back on track and regain your rankings.