Blogger, Robots.txt, Canonical URLs, Feeds - Let's Gain some Synergy
A little rant where I ask Blogger to make a slight change. The story begins...
Several months back, Blogger changed the way they did comments somewhat. Short version is they broke up post pages that receive many comments (200+). This is fine. In doing so, they also had to add some query parameters to comment permalinks so they could work with the new pagination. Again, nothing wrong with that.
But sometimes googlebot gets confused with a page having multiple urls. These are canonical issues. ( Admission - I have trouble both spelling and saying canonical, but I digress). This hit our favorite blog Hacks when I noticed suddenly last December hits from Google had dropped off, to almost nothing. What I found when I started looking into it was it seemed Google was now suddenly grabbing tons of these comment permalink urls and giving them prominence over what should be the real url. Example.
Instead of having this url in the Google index
............./2006/09/code-for-beta-blogger-label-cloud.html
Which is the proper url, which has hundreds of links pointing at it, Google was indexing urls like this instead
......./2006/09/code-for-beta-blogger-label-
cloud.html?showComment=1221076440000
Which is just a link to a particular comment. But since that url doesn't having any links to it (as it shouldn't) it doesn't rise to the top of any searches like the real url would.
In a perfect world, Google usually knows which is the better url, and probably most of the time it does things correctly. But for whatever reason Google was mucking it up the same way on many of my posts that used to get search hits.
I should say here that ultimately I don't give a shit. Hell it took me a month to even notice as I'm not doing a lot of posting. I don't really care too much whether I get hits or not and I figure it will eventually work itself out. But it's a problem with such an easy solution.
Now my good pal, Notorious I.M.P, pointed out to me this recent post from Google Webmaster Central talking about a new feature that allows you to fix (supposedly) canonical issues like this by adding a "hint" to google with a tag. Well that's fine. If'n it works.
If you wanted to try it for Blogger you would add something like this to the head of your template.
I've just tried that so I don't know if it actually works or not, but it outputs things the way the Webmasters Tools blog says to.
But for Blogger, we are really fighting a battle that could have a better solution. Only we users can't do it. It would have to be done at Blogger. A better solutions would be to add a few lines to the robots.txt file to take into account these canonical urls that the comment pagination changed caused. Blogger already blocks urls with "search" in the path which correctly blocks redundant label/search pages. But a few tweaks would help out also. Something along the lines of adding these two lines to the robots.txt
Disallow: /feeds/comments
Disallow: /*?showComment*
Now if I have that right (which I may not) that would 1) Block comments feeds from being indexed. That's where I believe the comment pagination links are being picked up from mainly, and besides comments feeds really don't need to be indexed do they? 2). Would block any urls with ?showComment in them which would block any of the comment permalinks if they got picked up somewhere else.
Or maybe there is a better way of doing it. Or they could do nothing. I'm just ranting.
Several months back, Blogger changed the way they did comments somewhat. Short version is they broke up post pages that receive many comments (200+). This is fine. In doing so, they also had to add some query parameters to comment permalinks so they could work with the new pagination. Again, nothing wrong with that.
But sometimes googlebot gets confused with a page having multiple urls. These are canonical issues. ( Admission - I have trouble both spelling and saying canonical, but I digress). This hit our favorite blog Hacks when I noticed suddenly last December hits from Google had dropped off, to almost nothing. What I found when I started looking into it was it seemed Google was now suddenly grabbing tons of these comment permalink urls and giving them prominence over what should be the real url. Example.
Instead of having this url in the Google index
............./2006/09/code-for-beta-blogger-label-cloud.html
Which is the proper url, which has hundreds of links pointing at it, Google was indexing urls like this instead
......./2006/09/code-for-beta-blogger-label-
cloud.html?showComment=1221076440000
Which is just a link to a particular comment. But since that url doesn't having any links to it (as it shouldn't) it doesn't rise to the top of any searches like the real url would.
In a perfect world, Google usually knows which is the better url, and probably most of the time it does things correctly. But for whatever reason Google was mucking it up the same way on many of my posts that used to get search hits.
I should say here that ultimately I don't give a shit. Hell it took me a month to even notice as I'm not doing a lot of posting. I don't really care too much whether I get hits or not and I figure it will eventually work itself out. But it's a problem with such an easy solution.
Now my good pal, Notorious I.M.P, pointed out to me this recent post from Google Webmaster Central talking about a new feature that allows you to fix (supposedly) canonical issues like this by adding a "hint" to google with a tag. Well that's fine. If'n it works.
If you wanted to try it for Blogger you would add something like this to the head of your template.
I've just tried that so I don't know if it actually works or not, but it outputs things the way the Webmasters Tools blog says to.
But for Blogger, we are really fighting a battle that could have a better solution. Only we users can't do it. It would have to be done at Blogger. A better solutions would be to add a few lines to the robots.txt file to take into account these canonical urls that the comment pagination changed caused. Blogger already blocks urls with "search" in the path which correctly blocks redundant label/search pages. But a few tweaks would help out also. Something along the lines of adding these two lines to the robots.txt
Disallow: /feeds/comments
Disallow: /*?showComment*
Now if I have that right (which I may not) that would 1) Block comments feeds from being indexed. That's where I believe the comment pagination links are being picked up from mainly, and besides comments feeds really don't need to be indexed do they? 2). Would block any urls with ?showComment in them which would block any of the comment permalinks if they got picked up somewhere else.
Or maybe there is a better way of doing it. Or they could do nothing. I'm just ranting.
Leave A Comment
Post a Comment