http://sharkseo.com/wp-content/themes/press
ITN & The Problem With Duplicate Content

25 Feb 2010, Posted by admin in Featured,Nohat, 4 Comments

ITN & The Problem With Duplicate Content


Duplicate content is quite often an issue that’s made to sound worse than it really is – people talk about duplicate content “penalties” and the like, which makes it sound quite dramatic. In truth, duplicate content is a fairly natural part of the web – it happens all the time. It’s also not quite true to say that there’s a penalty – there is, however, a filter. If Google detects duplicate content, say 3 or 4 articles that are all exactly the same, then when it sees a query that deserves that article in the results, it won’t display all of those article pages. It wouldn’t make sense to users if, say, all 10 of Google’s listings were for the exact same article – Google wants to display some variety. As a result, Google will only show one of those articles and will filter out the rest. Usually Google tries to find the originator of the content, the site that wrote it first – and they double-check this by also seeing if it’s authoritative enough.

There’s two main kinds of duplicate content, and it can affect sites in different ways. On-site duplicate content occurs when pages are repeated across one domain. Off-site duplicate content happens when a site’s content is repeated across other domains – ITN.co.uk frequently has their content distributed across orange.co.uk, msn.co.uk and yahoo.co.uk for example.

On-Site Duplicate Content

Firstly, there’s on-site duplicate content – where you have the exact same page repeated across two or more URLs. An example might be having an article on your main (real) URL, and then having the same article on a printer friendly page. It happens very easily, if you display the full post on the homepage in WordPress, for example, then you run the risk of having that page appear in full on the homepage, on the tag pages, in the category pages and finally on the (real) post page itself. This doesn’t tend to cause major problems unless one of the duplicate pages starts getting all of the links – so if the printer friendly version of the page was the one that was heavily linked to, you may find that ranking in the search results instead of your real article. The unseen downside is that if that printer friendly page gets a few links, but not enough to rank in place of your real article, those links to the duplicate page will still be less likely to help your real article rank.

You can reclaim those lost links, and ensure that your real article is the one that ranks, by using either 301 redirects to redirect duplicate pages to the real version or by using canonical tags (which are slightly more useful in the case of printer friendly pages).

Off-Site Duplicate Content

It’s less common to have content that’s duplicated across a number of different sites, but it still happens. Sites that syndicate out their content, article directory sites and press release sites all have this issue – the exact same article may appear on PRWeb.com and a whole load of other sites that have chosen to pick up that press release.

In either case, you’re not going to get a penalty – it happens naturally.

If you have an article on your site – the same article that appears on a number of other different sites, and somebody searches for it, then Google is only going to try and display one result that leads to that article. If you wrote that article, then you should be the one to get that traffic – but it doesn’t always work that way. Google usually tries to display the site that first wrote the content, but sometimes just displays whichever site is most authoritative.

Google works out the originator of the article by looking at who links back. If Site A writes the content and it gets picked up by Sites B, C and D – if B, C and D are all linking back to A then it’s a clear signal to Google that they should rank site A, and that the remaining sites should be filtered out. A massive problem arises, though, when the remaining sites don’t link back to that article page – especially if the sites that pick up that article have more authority than the originator.

Where does ITN fit in?

ITN.co.uk often create articles and then syndicate them out to other large sites at the same time as publishing them themselves. Because the large sites often don’t link back properly, Google has a hard time working out who the content really belongs to. This article on astronauts carrying out a space walk was originally written by ITN, but was syndicated out to a load of different sites too, including to Yahoo. Even though ITN wrote the story, because they launched at the same time as Yahoo (and a number of other sites including MSN) then Google isn’t always sure which site is the originator. As a result, it’s easy for Google to filter out the wrong site – currently a search on a snippet of text from that article for me shows that ITN is filtered out, while Yahoo ranks.

How can ITN get their search traffic back?

If I was ITN, I’d look at getting an agreement in place with Yahoo, MSN and the like so that all of the articles syndicated out included a link at the bottom that included the article headline linking back to the source, e.g. “Astronauts carry out space walk is an article from ITN News“. While this doesn’t guarantee they won’t be filtered out for duplicate content, it should strongly help their chances – Google will usually look at who everyone links to to determine which site originally produced the content. If they wanted to take it a step further (and this may not even be possible with a topic as sensitive as news), they could launch their content, send out a ping to help the article get indexed, and then a few minutes later release the article to Yahoo, MSN and the other big news sites.

ITN’s best chance of getting their search traffic back is to make sure that they include links back to the article when they syndicate out the content. It’s not guaranteed to work, nothing in SEO is, but the worst case scenario is that they pick up a lot of massively authoritative links.

Promote Post

Enjoyed this post?

4 Comments

February 25, 2010 11:08 am

Chris

Nice post. Found it very interesting, I often find myself falling into the mindset of penalties rather than filters. When you change the way you think about such things it tends to open a lot of possibilities or feelings of freedom… Matrix moment: There is no spoon!

March 16, 2010 11:00 pm

Andy

Very interesting post. Here’s an idea:

Modify an autoblog script so that when it copies articles from blogs, it also comments on the page that it stole from. Have the URL in the comment go to a different page on your site that you 301 to the page with the content. That way, google ranks you, not them. More simple:

*BlogA publishes article at BlogA.tld/article.
*Script (for your autoblog) scrapes article from BlogA and posts it to BlogB.tld/article
*Script comments on BlogA.tld/article with a link to BlogB.tld/article301
*BlogB.tld/article301 is a 301 redir to BlogB.tld/article. This gives you the link juice in google’s eyes, plus the owner of BlogA is unlikely to notice that you copied their article

March 19, 2010 6:32 pm

admin

Thanks for the comment Andy. Not sure I follow though – I don’t really advocate scraping other people’s content.

I also don’t think your method would work there – surely the site owner would see you’ve taken their content? (If they follow your link, they’ll still wind up at the same content after the 301). Also the vast majority of blog comments are generally nofollow – so Google won’t treat the copied content that’s nofollow linked to as the original. I would guess, anyway, because otherwise scrapers that use trackbacks would be causing a ton of damage to the SERPs.

April 19, 2010 5:58 am

Darius Money

Great post by the way… From my experience the whole duplicate content penalty is a myth… Reason: I have a niche blog that is built off of duplicate content.. It ranks number 7 for the keyword “digital cameras under $100″. Its all about giving your site authority, and building quality back links.. So if you have a authority site that has duplicate content you won’t be penalized… If you throw some duplicate content on a spammy made for ad sense blog, than im pretty sure google will penalize you… not because of the duplicate content, but because your site is garbage…

Posting your comment...

Leave A Comment


Subscribe to this comment via Email