Should HTML be considered as a data format?
As HTML is becoming more and more semantic, at least in intent, and all styling is moving into CSS, one has to wonder what it is now representing. It seems like it is now a format for unstructured data (a.k.a. rich text), in the same sense that XML and JSON are formats for semi-structured and structured data and CSV is a format for tabular data.
If that is the case, it should become commonplace that this data gets rendered by a variety of clients, not just browsers. This has already begun of course: an RSS feed reader for example consumes HTML, word processors can read and write HTML, e-mail clients use HTML for rich text. Naturally most of the times, these applications work by embedding a web browser but it doesn’t need to be the case.
If HTML becomes truly semantic (and if we can ignore the huge majority of existing contents that is less than ideally written), you could imagine it being rendered in many different ways. For example, you could collapse it to outlines, you could consume it as a repository or even display it in a completely different, non CSS-driven rendering engine. The point here is that there is an opportunity to take this decoupling of data and its graphical representation that semantic HTML and CSS promise and use HTML to its full potential as a data format.
I realize these thoughts might seem a little vague. This post really is a call for comments and ideas. Does this make sense or are we in the middle of Obviousland?