25 Feb 2010, Posted by admin in Featured,Nohat, 4 Comments
Duplicate content is quite often an issue that’s made to sound worse than it really is – people talk about duplicate content “penalties” and the like, which makes it sound quite dramatic. In truth, duplicate content is a fairly natural part of the web – it happens all the time. It’s also not quite true to say that there’s a penalty – there is, however, a filter. If Google detects duplicate content, say 3 or 4 articles that are all exactly the same, then when it sees a query that deserves that article in the results, it won’t display all of those article pages. It wouldn’t make sense to users if, say, all 10 of Google’s listings were for the exact same article – Google wants to display some variety. As a result, Google will only show one of those articles and will filter out the rest. Usually Google tries to find the originator of the content, the site that wrote it first – and they double-check this by also seeing if it’s authoritative enough.
There’s two main kinds of duplicate content, and it can affect sites in different ways. On-site duplicate content occurs when pages are repeated across one domain. Off-site duplicate content happens when a site’s content is repeated across other domains – ITN.co.uk frequently has their content distributed across orange.co.uk, msn.co.uk and yahoo.co.uk for example.
On-Site Duplicate Content
Firstly, there’s on-site duplicate content – where you have the exact same page repeated across two or more URLs. An example might be having an article on your main (real) URL, and then having the same article on a printer friendly page. It happens very easily, if you display the full post on the homepage in WordPress, for example, then you run the risk of having that page appear in full on the homepage, on the tag pages, in the category pages and finally on the (real) post page itself. This doesn’t tend to cause major problems unless one of the duplicate pages starts getting all of the links – so if the printer friendly version of the page was the one that was heavily linked to, you may find that ranking in the search results instead of your real article. The unseen downside is that if that printer friendly page gets a few links, but not enough to rank in place of your real article, those links to the duplicate page will still be less likely to help your real article rank.
You can reclaim those lost links, and ensure that your real article is the one that ranks, by using either 301 redirects to redirect duplicate pages to the real version or by using canonical tags (which are slightly more useful in the case of printer friendly pages).
Off-Site Duplicate Content
It’s less common to have content that’s duplicated across a number of different sites, but it still happens. Sites that syndicate out their content, article directory sites and press release sites all have this issue – the exact same article may appear on PRWeb.com and a whole load of other sites that have chosen to pick up that press release.
In either case, you’re not going to get a penalty – it happens naturally.
If you have an article on your site – the same article that appears on a number of other different sites, and somebody searches for it, then Google is only going to try and display one result that leads to that article. If you wrote that article, then you should be the one to get that traffic – but it doesn’t always work that way. Google usually tries to display the site that first wrote the content, but sometimes just displays whichever site is most authoritative.
Google works out the originator of the article by looking at who links back. If Site A writes the content and it gets picked up by Sites B, C and D – if B, C and D are all linking back to A then it’s a clear signal to Google that they should rank site A, and that the remaining sites should be filtered out. A massive problem arises, though, when the remaining sites don’t link back to that article page – especially if the sites that pick up that article have more authority than the originator.
Where does ITN fit in?
ITN.co.uk often create articles and then syndicate them out to other large sites at the same time as publishing them themselves. Because the large sites often don’t link back properly, Google has a hard time working out who the content really belongs to. This article on astronauts carrying out a space walk was originally written by ITN, but was syndicated out to a load of different sites too, including to Yahoo. Even though ITN wrote the story, because they launched at the same time as Yahoo (and a number of other sites including MSN) then Google isn’t always sure which site is the originator. As a result, it’s easy for Google to filter out the wrong site – currently a search on a snippet of text from that article for me shows that ITN is filtered out, while Yahoo ranks.
How can ITN get their search traffic back?
If I was ITN, I’d look at getting an agreement in place with Yahoo, MSN and the like so that all of the articles syndicated out included a link at the bottom that included the article headline linking back to the source, e.g. “Astronauts carry out space walk is an article from ITN News“. While this doesn’t guarantee they won’t be filtered out for duplicate content, it should strongly help their chances – Google will usually look at who everyone links to to determine which site originally produced the content. If they wanted to take it a step further (and this may not even be possible with a topic as sensitive as news), they could launch their content, send out a ping to help the article get indexed, and then a few minutes later release the article to Yahoo, MSN and the other big news sites.
ITN’s best chance of getting their search traffic back is to make sure that they include links back to the article when they syndicate out the content. It’s not guaranteed to work, nothing in SEO is, but the worst case scenario is that they pick up a lot of massively authoritative links.