definition of microformat

A microformat is a specific collection of names, values, and accompanying structure defined through rigorous market research intended to consider pervasive use of semantic html that increases data fidelity in HTML-borne data widely distributed on the web. Microformats are more than one of these, eg: “hcard and hcal are microformats.”

This is my definition for microformat. It’s not an official definition, but appears on the What are microformats? wiki page. I thought I’d spend some effort filling out this dense definition a bit more. Especially since there are some misconceptions out there about what microformats are.

HTML and Extensions

The most salient aspect of a microformat is that it is a mini-format inside HTML. HTML provides some hooks for generalized extensions. Most notable, is the class attribute, which the w3 defines in the HTML4 spec.

The class attribute, on the other hand, assigns one or more class names to an element; the element may be said to belong to these classes. A class name may be shared by several element instances. The class attribute has several roles in HTML:

* As a style sheet selector (when an author wishes to assign style information to a set of elements).
* For general purpose processing by user agents.

The example they give uses class names of “info”, “warning”, and “error.” This is a great example, because these class names are heavy-laden with semantic value. It shows how authors can use class as a general extension mechanism to add value to their HTML documents. The example then goes on to show how this semantic value is simultaneously useful for styling with CSS; obviously this isn’t an exclusive relationship.

The other hooks in HTML for general extensions are profile, meta, and rel/rev attributes. However, microformats tend to heavily use class for most microformat extensions. (See why visible metadata is better.) Lots of people do it, but not all instances of using class with well-crafted HTML are microformats.

Symbolism and Representations

A microformat isn’t just any use of semantic HTML. It’s also not any attempt to encode information in HTML, although that is the primary use case. Data formats, in general, are representations of data. It’s symbolic: just as a painting of a chair isn’t actually a chair, it’s a representation of a chair. In the technical world, we like our representations to have high data fidelity. When formats have high data fidelity, the chances are good that the representation of an object very closely matches the actual object. In html, this has traditionally been very difficult because of HTML’s low data fidelity. If I have a docbook document, and choose to publish it in HTML, it would be difficult to transform it back into docbook. Even if I was able to do so, the resulting document would not be the exact same as the original docbook document. This is low data fidelity in action.

However, using the general extension mechanisms we saw earlier, it’s possible to construct an HTML document that a parser could transform back into the original docbook document. This is one of the main goals of microformats. Raising the data fidelity of HTML while ensuring the data represented is visible.

Paving the Cowpaths

If not all attempts to raise data fidelity using the generic extension mechanisms of HTML are microformats, which ones are and which ones are not? That’s where the concept of market research comes in. The idea is that some pieces of data are very common.

I’m working on a theory I call “environmental data types”, and it is these that I believe are most common. For example, new technologies typically embrace representings certain kinds of data first. What are these data types? The most common ones appear to be events and people. Everyone I know has used some kind of technology that represents events and/or people. Calendars must be one of the first technological inventions of the human race (I have no evidence for this claim.)

Anyway, as an example, if lots of people are interested in publishing information about dates/events (hcalendar) and people/contacts (hcard), and perhaps the relationships between them (XFN), the idea is that certain mechanisms for doing so will be more common than others. That is to say that there are evolutionary/economic forces acting upon which mechanisms people choose to publish this information, and how they choose to do so. The microformats organization attempts to discover which data types are common along with the most common ways for doing so. We call this “the process.” It involves collecting examples of things people are actually publishing on the web, looking for the commonalities, and making a standard out of the results of this research.

This is a radical departure from many engeineering efforts, which involve predicting the future by attempting to consider all possible use cases, before they occur. These predictions are historically performed by a handful of experts, spending lots of time attempting to flush out any potential problems. The microformats approach considers what has and is working, distills the lessons learned from experience, and proposes that people now start doing it the new way. Since the microformat is based on what people actually do, it is often trivial for them to gain the benefits of the new technology because it fits what they already do. The beauty of this is that it’s easy to make incremental changes, and easy to gain adoption. This last bit is the really amazing part: the microformats approach solves the social problems associated with gaining mind share and use as part of its core methodology. It also means a microformat has already scaled to web-scale proportions before it became a microformat.

What does it all mean?

We’ve seen that there are general mechanisms for extending HTML. These techniques are used by web authors and developers to represent different kinds of information. When a particular application of these techniques is used to represent a common data type (such as contacts), Microformateers are able to apply the microformats methodology to discover these commonalities, and share the results with the web community that produced them. It is this end result that is a microformat: an application of general HTML extensions techniques towards a specific type of data that has been vetted through real use on the web, and conventionalized by the engineering methodology promoted by Microformateers. Or, as I like to say: “A microformat is a specific collection of names, values, and accompanying structure defined through rigorous market research intended to consider pervasive use of semantic html that increases data fidelity in HTML-borne data widely distributed on the web.”

What can I do?

One of the nice side-effects of the microformats process is that the engineering efforts are an open, community effort. Anyone can join the wiki, the mailing lists, and the IRC channel to contribute. Contributions typically entail: collecting examples of techniques of representings and the data it represents on the web, looking for commonalities between these techniques, codifying these conventions into a standard. This can be hard work, but if you want to help microformats without too much effort, you can help by using semantic html. The techniques that microformats employ to extend HTML are available to any publisher or developer, and by using them, and encouraging their use, you are helping microformats! But please don’t call it a microformat. If your techniques are reproduced by many other people, the microformats process can be applied to it, and everyone will benefit from your clever authoring.


Post a Comment

Required fields are marked *

%d bloggers like this: