Duplicate Content Problems with the Drupal Quicktabs Module

I'm a big fan of the Quicktabs module, which easily lets you create tabbed content in Drupal.  However, there is an SEO problem that people should be aware of when using it.

On my Pitbulls.org site, I noticed Google indexing multiple versions of the same page.  This is known as duplicate content and it can penalize your rankings. Given that I had already modified my robots.txt file specifically to prevent this type of thing, I was a little confused.

But the solution was simple.  It turns out that Quicktabs creates "unique" URLs of pages it is displayed on, so it can do the correct action when a user clicks on a tab.

So on my site, for instance, the "Most Popular" tab is link with "http://www.pitbulls.org/content/welcome-pitbullsorg?quicktabs_1=0#quicktabs-1"

And Google indexes this link, thinking it's a seperate page from http://www.pitbulls.org/content/welcome-pitbullsorg .  But it's not.

Don't worry.  The solution is simple.

In your robots.txt file, put the line: Disallow: /*quicktabs_*

And this little problem is solved.

UPDATE: After some more research, the above might not resolve the problem completly.  Google cannot crawl the url, but they can still see it and may put it in the index.  A robots.txt block is better than nothing, but it might still pass pagerank to page you don't want to rank.

So there are two possible ways to solve this. And if you use one of these ways, do not use Disallow in robots.txt, because then the bot will not be able to see what you are doing.

Canonical URLs

One is with the canonical tag.  This tells search engines to look to another page for the "real" content. This is also a useful tag for pages of data that can be sorted.  The data is the essentially the same, just in a different order, and so you don't really want search engines to treat them as seperate pages.  For example, lets say you had http://www.example.com/data and when you sorted the data the url ended up being http://www.example.com/data?sort=blahblahblah.

Since it's a different URL, a search engine will treat it as a different page.  But you don't want that. So you add: 

<link rel="canonical" href="http://www.example.com/data"/>

between the <head> tags of http://www.example.com/data?sort=blahblahblah page so it points back to the core page.

You can read more about it at http://www.google.com/support/webmasters/bin/answer.py?answer=139394

The Nodewords module can help you add canonical tags to nodes, so even if the path has extra variables, it will be pointing to the correct canonical URL.

NOINDEX Tags

If a page has a robots "NOINDEX, FOLLOW" meta tag, search engines will crawl the page, but will not add it to the index, so the page will not show up in the results and not count as duplicate content.


<meta name="robots" content="noindex, follow">

 

The "follow" is very important.  It allows any links on the page to continue to pass link juice to other pages.

Nodewords can also help you add this to nodes and pages, however it is less useful in this case because you can't easily target pages that only have extra query strings.  And misues of the NOINDEX tag can have severe consequences for your site.

So I recommend you use the "canonical URLs" solution.

 

Comments

Good article, I also stumbled across this duplicate content issue with quicktabs, but this solution will only work on Google as it is using wild card.

No, it will work for Bing and Yahoo as well. They support wildcards. They didn't back in 2005. At least MSN did not.

For anyone being affected by this problem, this issue is being worked on in the quicktabs issue queue:

http://drupal.org/node/354867