Joe White's Blog Life, .NET, and cats

Apparently, GMail isn't yet XHTML-compliant #Life

For a while now, I've been subscribed to get e-mails whenever someone posts something to the FxCop forum. (How primitive. I'm still waiting for them to get their act together and start providing RSS feeds like a civilized Web site.)

As soon as I got my GMail account, I started sending the forum notifications there. And let me tell you, GMail's threading features are nice for reading mailing-list e-mails. It automatically groups all the messages with the same subject, and lets you read an entire thread at once, on a single screen. Very slick.

The only problem is, GMail strips out all the line breaks. So the entire message gets rendered as a single paragraph, with no formatting at all. Ouch. Reading source-code samples is downright painful. Reading text is downright painful, if there are more than a few paragraphs and they're not split up in a sane way. At first I figured everyone was just a silly newbie, but once it started happening to posts from people I had read before, it dawned on me that something wasn't right.

So this morning I decided to figure out why. GMail lets me view the full headers and source of the original message, so I did. Looked fairly normal; there was a text version and an HTML version, in multipart/alternative format, which everyone should speak. Both the plain-text and the HTML versions had line breaks (or line-break tags) where they ought to be. Back in the GMail window, it was obvious they were showing the HTML version, because there were placeholders for the GotDotNet image banner at the top.

Okay, do a View Source on the GMail window.

Heh. Or not. All I see is a message saying "Javascript is disabled in your browser", together with a few JavaScript calls, all mashed together on one line. Makes it hard to see how GMail rendered my message, doesn't it?

Powernarwhal to the rescue. FireFox lets you select text, and then view the source for just that selection, and it works even for HTML that was rendered by JavaScript. Cool.

So here's the deal: The message arrived with some <table> tags, and those came through intact, as did their relatives <tr>, <td>, etc. The <img> tags were tweaked, of course, to hide the images; but they also came through. <span>? No problem.

The message included some other, mysterious tags — the <ThreadTitle> tag, for example. I'm not quite sure why they put that one in, since nobody's going to have a clue what to do with it. GMail apparently wondered the same thing, because they stripped it out. The text inside it came through; the tag itself did not. Interesting.

That's not the only tag they stripped out, either. The <br/> tags were gone as well. (Sure explains the missing line breaks in a hurry, doesn't it?)

I suspect (though I haven't checked) that <br> tags (without the "/") do come through okay, and that there's a bug in their regex. Duly reported via their feedback system. It's really weird that they would lose XHTML-compliant tags, because they apparently convert all the tags to XHTML casing (all-lowercase) as part of their rendering — <Span> in the e-mail is rendered as <span> by GMail. Unlike, say, my blog's GUI editor, which insists on making all the tag names all-caps (ouch).