Table2CSS logo Table2CSS is a tool that can convert your table-based websites to tableless layouts, replace deprecated HTML tags with modern CSS and reformat your HTML code. Click here for more information

How to avoid duplicate content in order to get a better search engine ranking

Nowadays most popular search engines filter duplicate content. This article explains what is duplicate content filtering how that may affect you and how to avoid the negative effects from duplicate content filtering.

What is duplicate content?

If there are two or more distinct web pages which have almost same content, then we say that all these pages have duplicate content. Note that the page content does not have to be 100% identical for the pages to be detected as duplicates. Even if there are differences between the pages they can still be considered duplicates, if these differences are small enough.

Sometimes we know that one of the pages had appeared first on the web and later the remaining pages have copied the content of the first page. In this case we say that the first page is the original page and the remaining pages are duplicates.

How do search engines treat duplicate content? Major search engines including Google and Yahoo detect duplicate pages, then decide which is the original one and which are duplicates. Then for all searches the original one is displayed while the duplicates are hidden by default. The user is given an option to view the duplicates, but most users will not bother to do that.

Why does duplicate content appear on the web and how to workaround the problems with search engines?

The most reliable way to avoid such problems is to avoid the creation of duplicate content in the first place. And how to avoid content duplication in turn depends on the cause for this content duplication. There are several reasons for the existence of duplicate content on the web. Below is some specific advice for each case:

  • Submission of the same content to multiple web sites in order to reach more readers and drive more traffic to the original website. The more websites publish some article, the more readers are likely to read that content. If the article has a link to the original website, then increasing the number of readers will also lead to more people following that link in the end driving more visitors to the original website.
  • Submission of the same content to multiple websites in order to increase the search engine ranking of the original website. Most popular search engines like Google and Yahoo use the number of hypertext links pointing to some website as a measure of the website's popularity. If the submitted article has a link to the original website, then it will naturally lead to more incoming links and therefore to higher search engine ranking of the original website.

    Regardless of the exact reason for the submission of the website to multiple directories it is easy to fight the problem in this case since you can control the actual content that is being sent to these web directories. When registering your website with different directories send different descriptions to each directory. If it is unfeasible to create different website description for each web directory, then create 10-20 descriptions (the more the better) and choose randomly one when submitting your website to the web directories. This is not as good as using completely different descriptions for each directory, but still much better than using the same description everywhere.

  • Automatic aggregation of news sections and RSS feeds. If you have a news section or an RSS feed on your website, then it is possible that your website content is being aggregated and published on other websites even without your knowledge.

    This is a somewhat tricky issue - on the one hand you want your articles and news to be displayed on other websites, since this way these articles and news will reach much broader audience. On the other hand you do not want the duplicate pages to shadow your original web page. Probably the best solution in this case is to give the web spider or aggregator different content from the one that is displayed on your web page.

    Misconfigured web servers that allow access to the same website through different hostnames. Most popular web servers allow website aliases, i.e. the website data can be served for one main hostname and for several other domains which are called "aliases". For example you can have your website displayed at http://www.your-host-name.com, but you will see the same website when you visit http://your-host-name.com or http://demo.your-host-name.com While this is quite convenient for the user that will be able to reach your website even if he does not remember the exact URL, this effectively leads to duplicate content, since from the search engine's point of view http://www.your-host-name.com and http://your-host-name.com are different websites.

    The standard solution in this case to use HTTP permanent redirects (301) in order to redirect visitors from the duplicate websites to the original one. Search engines are intelligent enough not to treat the redirecting sites as duplicates in this case.

    The way to implement such redirects varies for different web servers. For example in Apache (the world's most popular web server) this can be done using the mod_rewrite plugin.

  • Website load balancers that use multiple web servers working on different hostnames and different IP addresses. This is often the case when an organization wants to perform load balancing based on the visitor's geographical location. For example users that visit http://www.your-host-name.com from USA are redirected to http://us.your-host-name.com, visitors from Europe are redirected to http://eu.your-host-name.com. In this case the websites at http://us.your-host-name.com and to http://eu.your-host-name.com will have identical or nearly identical content.

    The only possible solution in this case not involving a complete reorganization of the load-balancing infrastructure is to actually display different data on the different hostnames. Usually this is not a real problem since usually this kind of load balancing is done in order to load-balance requests from clients located in different geographic regions. Since visitors from different geographic regions usually speak different languages, the content of the different websites can be written in different languages thus avoiding duplication of content.

  • Fake websites created specifically with the intention to divert traffic from the original site to these fake websites. Creation of fake websites that mirror the original content and divert searches from the original website to the fake ones is a well-known black-hat SEO technique. It is based on the fact that search engines are not able to determine with 100% accuracy which website is the original and which are the duplicate ones. As a result in some cases one of the fake pages is considered to be the original one and is displayed for all searches, while all the real page not displayed. Effectively this means that the fake page diverts searches (and hence visitors) from the real one.

    The optimal solution in this case is to contact the company that hosts the duplicate website and to ask it to take down the duplicate website since it violates your copyright.