After last time on WordPress permalinks, I’ve questioned appending the .html to the end of posts. So, I did an exhaustive research of the matter.
Granted, the .html is faux in nature. These aren’t static html pages, we’re just adding it to help improve search engine ranks.
The appending of html does a few things:
1) Prevents /my-article/ and /my-article from being viewed as different pages. Bloggers are only going to do .html… since bloggers know not to touch that. This makes the canonical URL forced… nobody can even consider a link other than the one the URL originated from.
2) Encourages not linking to comments. /my-article.html#comments stands out more than /my-article/#comments (in other words, bloggers are more likely to chop off the #comments, and link to the root, .html article). However, search engines may be less likely to view it as the same page (they may see /#comments as being root of the trailing article, but see .html and .html#comments as different), so it is a bit of a double-edged sword.
3) Pages can be made static down the road, in case you want to change platforms in the future. Of course, you’d have to go through and use some sort of script to do this, but then you can keep the articles where they stand.
Reason #1 has been diminished significantly by Google. Google deployed a technology internally called BigDaddy, which tends to these minor un-canonicalities. Reason #2 is a toss-up. Reason #3 is the one that becomes the winner.
Having transitioned from PHP-Nuke, to Joomla, to WordPress… PhoneNews.com has been a huge loser in the search engine ranking business. The switch to WordPress should end that, but nobody can forsee the future of SEO. We may have to change platforms over there, and that holds even more true for my own blog (since, I’ll probably be writing on here much longer than on PhoneNews.com, it is my personal blog after all).
In the end, should you add .html? That’s up to you, you need to ask yourself how important SEO is for a site, and if your existing structure already needs changing. Since I was already having to change permalink structure… it made sense to toss .html into the mix at the same time.
One issue you *may* run into with adding .html onto the end of your permalinks with WordPress is some plugins may not work correctly. Specifically image galleries and other things that create their own internal multi-pages that are separate from WordPress’s pages (such as a gallery split across internal pagination). I’m not sure it won’t work correctly, but I’ve seen reports on the support site for WordPress about problems that occurred due to the .html getting in the way of things. (I guess I should eventually check that with my own Image Gallery plugin AWSOM Pixgallery)
If they don’t, it’s a plugin bug. If they’re tapping the internal array properly in WordPress 2.X, those bugs shouldn’t surface after version 2.2.2… array bugs regarding .html should have been fixed in 2.3.
Probably the easiest workaround if you run into those issues is to form such pages as, well, pages… page structure is not affected by standard permalink changes (though if you go under-the-hood, you can change them)… only posts are affected.
I wouldn’t say it’s not a concern, but it’s a minor one and could probably be fixed easily in most offending plugins.
I am not sure on what you mean by search engines seeing .html and .html#comments as different urls…Do you find both getting indexed in webmaster tools…i don’t think search engines will treat them as different… # is meant to name a location on a webpage…Search engines would not consider them as different…
I’ve seen it on search engines other than Google in the past. Google remedied this with BigDaddy technology, like I said above. Of course, the best SEO position is one that doesn’t just take the big three (Yahoo, Live, and Google) into account… but it’s a minor concern.
The only problem would be if a bunch of big sites (Slashdot, digg, etc) picked up on the #comments link, and other sites (say, national media) started hitting the non-linked version. That could cause a PageRank split on engines other than Google.