Fixing HTML

Douglas Crockford put up a document titled Fixing HTML that I think can best be summarized using his own words from the document intro as a “proposal for a kinder, gentler HTML 5″. I agree with a lot of the stuff he has in there, though I’m definitely not that hard core of a web weenie. However some of the stuff also seems to make life harder for the mobile web folks. Is that really a problem? Has the evolution of mobile handsets and the browsers that go with them eliminated the need for simplified parsing and special treatment? Hard to say. The “HTML Ecosystem” is a bit different in mobile than it is online. Here’s a quick rundown of the participants:

  • Content Producers - This role exists for the online world as well, it’s simply the author of a page. Sometimes their disposition is much different than when online however. Maybe content producers have already made web content for the online world and get frustrated that their skills and knowledge doesn’t always transfer over.
  • Standards Bodies - Like the W3C need to balance the needs of all these folks somehow. And within mobile the early standards were produced by a completely different set of folks using a completely different take on the technologies and how the environment would play out.
  • Authoring Tools - People making software that makes it easier for content producers to make their pages. Normally that means abstracting away the details of the markup that goes into a page so that the author doesn’t need to see it. But even if the authoring tools provider follows all the standards their markup could still produce errors when fed into something like the dotMobi ready.mobi testing tool. How do you provide an authoring tool for a target that’s always shifting?
  • Browser Providers - Folks implementing the browsers themselves. Lately more and more effort in the mobile arena seems to be consolidating behind the WebKit browser engine. It’s used in the Nokia Open Source Browser and in the browser on the iPhone. That’s been nice, you can expect a degree of consistency there now across devices. But Mozilla has also reentered the fray saying they’re going to be revamping their mobile efforts. And there are a ton of existing browsers out there with their own takes on the standards, all making life just a little harder for everyone else.
  • Indexes - The Googles and Yahoos and Ask.com’s of the world. They consume HTML so that they can figure out what links to what and extract the content of pages. But there’s also a bunch of special purpose indexes, take comparison shopping sites for example. When the HTML standards change all these folks would potentially have to make updates as well.
  • Gateways - Are a very mobile specific player. In the online world nothing generally stands between your browser and the server returning a web page you’re looking at. In the mobile world that’s different however, there’s frequently a gateway sitting in the middle which will do everything from cleaning up documents that say they’re XHTML but fail XML validation to returning transcoded pages instead of the content you asked for. These folks are supposed to shield the simple devices from inconsistencies, but as a result have also interfered with publishers trying to handle the issues on their own.

And of course, this is just looking at cellular mobility. The environment shifts if you’re using wifi networks, even using wifi from your phone shifts the mix some because the gateway is almost always taken out of the picture.

So take something simple like the reintroduction of tag minimization for empty tags. It makes things a lot more friendly for content producers. But it also means that the browser on a device can’t include an XML parser and be done with it, cause the content is no longer valid XML. Anyone who has ever spent any time actually using the web at all and programming for it knows however that just because XHTML is supposed to be XML, that doesn’t mean that it is. People make mistakes in generating documents and you’ll always find non-XML documents as XHTML “out in the wild”. Definitely true, but can you count on the gateway to turn that cruft into XML for you? Solve the problem in the network and keep the endpoint simple?

Personally, I think not. The intelligence should be in the endpoint, and as soon as possible I would love to see the whole idea of a carrier gateway to the internet go away. That means that mobile browsers really need to start acting like their online counterparts and quit relying on the carrier networks for a crutch to get them well formed markup. The devices are getting there I think, but the browsers themselves are immature still compared to their online equivalents (and I don’t mean that as a slam, it just takes time for the technology and codebases to mature). It’s one of the reasons I’m excited to see open source efforts in this area. The “with enough eyes all bugs are shallow” principle should help surface issues more quickly, and shared code at the web browser engine level seems like it would go a long way toward driving a uniform implementation. Whenever everyone reimplements the same standards you always end up with slightly different takes on the same specification. Shared code keeps that from happening as often.

Leave a Reply