microformats: process discussion: search results as evidence

Before creating a microformat, the process demands a couple of
concrete actions to be taken first. The general idea is to document
current authorship techniques. If current techniques make it possible
to encode a piece of information, there is no need for a microformat.
Thus, we set ourselves up to make it difficult to create a format.
The specific steps required involved require us to document copious
amount of examples, and an anlysis of what those examples imply.

Take a look at the current recipe efforts: http://microformats.org/wiki/recipe.
(If you are aspiring to create a microformat, this is an excellent example of how to do so.)

What makes the research useful is the list of URLs with the subject
material actually appearing on those pages. In http://microformats.org/wiki/recipe-examples we see a list of
recipes grouped by a useful qualification (in this case the type of
publisher). On the brainstorming page, there is ongoing commentary on
what these examples imply.

Notice that no format is being proposed, instead examples are
collected, and the most commonly authored items are proposed as future
properties in possible future format.

The reason for this post, however, is to highlight the need for
primary sources as evidence. While searching is a useful technique
for finding resources, I’m not convinced they constitute useful
evidence (unless you are researching how search engines markup
results)., because they don’t contain any substantive material that
can be analysed in the brainstorming effort.

Some may argue that search results constitue evidence that a given
datatype is published, however, I’m not convinced this is true either.
The evidence for data or some property of data being published is
discovered by documenting and analyzing the pages they were actually
published on, and I’ve yet to see a public search engine capable of
performing this task.

-Ben West

Advertisements

implementing email over atom pub

It would be kind of neat to refactor email into a web style system which conforms to atom publishing protocol.

POP3 could be a type of store. When a user wants to check their email, their email client software could ask the atom server “GET /email/messages?start=yesterday%20afternoon&count=100“. The server would respond with a list of members fitting that description. Some meta-data could describe whether or not the message has been seen before, etc…

When the user wants to respond to an email, their client software might “POST /email/messages/02123/replies” with a response body representing a member entry: a single email message with the apropriate metadata. Or it might PUT /email/people/liz@example.com instead. The store could be a gateway to any email protocol.

The advantages of this over the traditional email systems aren’t very clear. However, it could potentially lift a technical barrier to entry for an unknown generation of expert user interface designers, which could lead to simpler and more useful lives by making it easier to create user interfaces for complex systems. (Eg, your mail would be pervasive over the web, in addition to not-web. I know it’s weak.)

bad amazon recommendations

Anyone know if/how I can tell amazon to never show me microsoft products, among other things?

atom pub, gdata, and one way to improve amazon+alexa

Has anyone tried using Amazon’s “RESTful” webservices? I’ve had many headaches using them. If I’m using PHP, I like to use nuSOAP to access SOAP services, because PHP’s facilities for XML (at least in version 4) are terrible. nuSOAP seems to have interop issues with some of Alexa’s/Amazon’s webservices, so that’s a pretty big turn off. The authentication requirements remain difficult/painful to implement, for some reason, even after doing it several times.

The focus on composability seems like a pretty nifty idea, however. S3 for storing things, and SQS for Queue services, EC2 is very cool for providing machines to run code on. The only problem is once you develop a client for one, you don’t have a working client for the others. They failed to provide a polymorphic interface to their services! On the web, this is really really bad form. In my short lived and naive experience, writing web services is primarily about two things:

  1. exposing some logic controlling a persistence layer
  2. organizing resources into logical units

These two activities become idioms, which become patterns. Open standards can save you a lot of work when it comes to interoperability issues, because they help define, or at least imply, how these patterns should be implemented. If amazon had used Atom Publishing Protocol, instead of the framework they are currently using, a single client implementation would be capable of consuming all their web services. What a mistake to not use it.

Furthermore, it’s a bit confusing as to why they labeled these services as RESTful to begin with, especially because they ignore the stateless requirements of GET, and put methods in the URIs. Any parser implementation intended as a component of a consuming application must learn how to individually parse each resource’s XML indepedently from the rest.

I’m not saying Atom Publishing Protocol is some magic silver bullet. I am suggesting that Atom Publishing Protocol helps you with one of the main tasks of developing web services: organizing and exposing resources.

Is anyone doing this? Yes. Google has added some extensions to Atom Publishing Protocol, and has called it GData. If amazon wants to compete with google in terms of being a platform, it’ll need to simplify the ways consumers get access to resources while increasing the surface area of exposure. One way to do that is to make sure that the burden for developers is very low, such as providing a single consistent interface (which HTTP already provides, and web architecture strongly implies) capable of consuming all your resources.

Common’ Amazon! Fight NIH! Don’t re-invent: re-use the known good patterns and idioms for publishing on the web. Atom publishing protocol is one possible codification of those common idioms and patterns, and it’s an open standard. It might not work out, but it’s certaintly seen a lot more peer review than your unRESTful web services model.

improving HTML

The W3C has restructured some working groups in the Interaction Domain. The new HTML working group is co-chaired by Dan Connolly of W3/MIT and Chris Wilson from Microsoft.

Both WHATWG and W3C are pursuing an improvement in HTML while simultaneously adding new features. My reading of charts seems to indicate the following:

  • WHATWG is focusing primarily on disambiguation of the HTML spec to trigger developer adoption, with new features as a secondary goal.
  • W3C is focusing primarily on new features to trigger market adoption with disambiguation/improvment as a secondary goal

Over the last few years, as my interest in usability has grown, I have also become curious about the issues surrounding how technology becomes adopted by society. Contrary to intuition, when a new technology becomes available, society at large takes a long time to adopt it. When zippers became available, it took 100 years before they became popular in clothing. Zippers offered a new and clearly superior technique for fastening over buttons, however, despite garment fastening consistently being one of the primary use cases for zippers, it took a long time before society actually started using zippers. (I recommend Evolution of Useful Things.)

While I’m glad for the effort of the W3C and WHATWG, and even look forward to participating in some of that work, I’m a bit cautious regarding their expected outcomes. What is the plan for pursuing adoption? I’m not convinced that adding new features is enough of an incentive to trigger market adoption because that model doesn’t fit the historical record very well. If there’s anything I’m learning, it’s that constant iterative improvement that is responsive to previous failure is the single most important process to improve technology with regard to market and societal adoption. Seeing the future is harder than seeing the past.

The WHATWG has a huge jumpstart over the W3C on discovering what the problems with the original HTML were. I sort of think they would be better off in a Firefox kind of methodology: spend defined periods of time alternating between fixing past faults and adding new features. Shorten the feedback loop to discourage hegemonious implementations from running the market.

The WHATWG has active involvement from browser vendors, and many content producers and developers. W3C HTML WG has just started, and probably has the most inclusive participant policy to date. Anyone can become an “invited expert” by successfully filling out a few forms. I’m currently in the middle of this process. The group’s co-chair is from Microsoft, but I believe Microsoft’s presence is completely absent in the WHATWG processes. While IE7 and the team that worked on it has focused on improving IE to become more conformant to interoperable standards, the legacy of IE6 and Microsoft’s historical commitment to open standards weigh heavily in my mind. What dynamics will determine how these groups work together?

I suppose we’ll see what happens in the months and years to follow.

issue 42

I chuckled a bit at issue 42 over at the TAG.

s-expression templating in python

I’m constantly on the hunt for the perfect templating module. When I was much younger, primarily doing PHP, I quickly saw that string concatenation and variable interpolation were not good solutions when trying to author lots of complex documents. I did some searching, and found Smarty, which satisfied my needs at that time. Over the past year, I’ve started using python primarily, and my desires have changed.

Real useage provides the practical definitions for software requirements. For now, I’d like to take a look at approaches I’d personally consider using for generating XML. For now, I consider XML generation a primary task, and we can talk about HTML generation some other time. In fact, I’m currently satisfied using XSLT to transform XML into HTML.

My first forray into templating engines was kid. They had xml-based syntax, with variable implementation reminiscent of smarty and php. It was supposed to be a kind of simpler, procedural XSLT-like templating language. However, I ended up not liking it very much. I know Genshi is around, which promises to make certain idioms simpler and advertises better performance, but I think my issue isn’t so much with the syntax, as the some of the conceptual foundations. Then again, I was attempting to use it for HTML and XML generation. Perhaps if I reconsidered for purely XML generation, I might change my mind.

Anyway, Kid has been around since November of 2004 (see line #236).

During my PHP days, I remember reading some LISP documentation. I found one lisp package that suggested that HTML looked an awful lot like s-expressions, but with uglier syntax. This really intrigued me. Earlier this year, I discovered stan, which is a python library implementing s-expression based html/xml generation through clever use of __call__. The basic idea is that you can invoke the name of the tag you want, provide keyword arguments to fill out the infoset’s attributes, and you use list syntax to fill out the contents of the object. There’s not much overhead for infoset processing as there are with other XML toolkits, presumably it performs quite well, although I’ve never tested it.

Stan is complicated though. It seems to be intended for primarily HTML generation, and it’s unclear how to use it in the same manner as suggested for HTML but for predefined XML vocabularies. There is some kind of callback system that I haven’t quite grokked, despite reading “meet stan” several times over. However, I like the approach.

I looked real hard for the history of stan. The earliest changeset I could find was 564, but it had no information and no file. So the second earliest changeset I could find is dated February 2004. I didn’t see any CHANGES or HISTORY files, but I would assume the author probably was noodling on stan for a few months before that. Please correct me if you know otherwise.

Stan also has a lot of dependencies, and can be a bit tricky to install. For some reason or another, I was unable to get it using the cheeseshop/distutils/easy_install method. Then a short while ago, I stumbled upon breve. It claims to be a fresh rewrite of TurboStan, by the author of turbostan. Basically, the story here is that the author created a turbogears plugin that allowed stan to be used in turbogears applications. After doing this, the author became wiser, and decided to rewrite stan without all the dependencies, and called it breve. I believe breve was written this year (2007) or late 2006, but I can’t quickly find any resources to verify, and I’m tired of researching.

On a side note, using AJAX for testing is an interesting idea.

Since Breve is pretty much similar to stan, I still don’t quite fully understand how the renderer and context stuff works. However, they do explain how to facilitate new xml syntax and added a “when” facility for easier conditionals. I think this is my current favorite, and I look forward to using this in projects going forward. What would be really nifty is to go from relax-ng to s-expression xml generation. Breve uses xsd to provide you with new xml tags, but I think it’d be better to go with relax-ng, especially the compact syntax. Optimally, it would have an option to check validate your syntax for you, as well.

Today, I was reading through Ian Bicking’s stuff, when I found (among many many interesting things) py.xml. Which claims to have been inspired by XIST. XIST is ANOTHER s-expression based xml generation library. They again chose to focus on producing HTML (I think this is a mistake.) This one uses named parameters for attributes, and unamed parameters as content. It’s also much older as the oldest entry is from some time in 2000, and it’s clear that it was around for some time before that entry (although they jokingly claim written history, heh). However, I had never heard of XIST until today. I also can’t find changesets or any history files to determine how long py.xml has been around, so I’ll assume it’s recent.

While these packages are capable of producing HTML, and the people authoring HTML could use it, I think it’s clear that the approach hasn’t worked. I think the authors of these libraries should start focusing on helping developers use them to produce XML, then tackle the problem of producing HTML separately. My guess is that the most common authors of HTML are graphic designers. I know some graphic designers who would be able to use this approach, but I also know several graphic designers, and probably even several programmers, who would not be able to use this approach. Too much overhead, and too many layers of indirection make it difficult to use.

In the spirit of the web, instead of attempting to unify or divide the files and code website authors share, let’s divide the URIs we share. One set of URIs for web developers to emit and consume XML, and another set of URIs for web designers to emit HTML, and consume the actions of humans. The two sets of URIs can talk amongst themselves by passing messages… over the web. This approach is more scalable. To even further separate the business logic of an application from presentation of the user experience (or web experience), you can have the web experience instruct the user (using normal HTML) to submit their forms and other interactiosn with side-effects directly to the developer’s set of URIs (perhaps call the web-app backplane?). Then when the web application has finished processing state and interaction, it can redirect the user back to the web experience, where they are given a nice human-consumable web page. This can also reduce accidental replays. It also allows designers and developers to work together without stepping on each others toes. It’s easier to test. It’s scalable.

P.S. any one know how to get decent looking code blocks in wordpress?

create a command for setup.py

I’ve been looking into distutils stuff lately. It seems like really nice stuff… it’s capable of all sorts of nice things. One of the things I’d like to do is add some commands for creating releases, which involves automatically creating a release branch in perforce. I was having a lot of trouble getting enough information to do this, so here it goes. I also found a very informative skeleton on lesscode.


class HelloWorld(Command):
  """This is a simple hello world command addition."""
  # user options is a list of tuples. each tuple conforms to:
  # ('long-arg-name=', 'a', "Description of option.")
  user_options = []
  command_name = "helloworld"
  def initialize_options(self):
    pass

  def finalize_options(self):
    pass

  def run(self):
    print "hello world."

You’ll also need to add setup(cmdclass={'build_py': build_py}, ...) to setup.py, as explained in the python docs.

toolbox: breve for xml generation in python

Every once in a while, I head over the cheeseshop and just scroll down. It’s really nice to see so many cool things. Today, I caught Breve, which is evidently inspired by Stan. The guy that wrote it wrote a Turbogears plugin to support stan, and seems very familiar with it, which drove him to frustration, and breve is the product a complete rewrite. I think this is terrific. I’m looking forward to using it.

The author, Cliff Wells, wrote in a blog post called Python template engines – why reinvent PHP? ponders about why templating authors don’t seem to like stan-a-likes. I think there are some usability principles at work here: monotony and consistency. When I write a template, it helps if the output looks fairly similar to the input. So if the output is HTML, I basically want the input to look very similar. When I read the stan docs, it’s not quite clear how to do some of the things that commonly need to be done in templates: conditionally output some class name, or block of HTML. With simple, procedural templates it is clear, which is why php-style templating engines are so successful.

I think the chief audience of breve, and similar efforts, are people that need a templating language for XML. That’s how I plan to use it. I plan on outputing pure XML, and then use XSLT to transform it into whatever format I need.

I wonder what part a templating engine would play in a atom publishing system. Would it even be necessary if the API was designed well? I believe amplee’s API includes a mechanism by which objects know how to render themselves as XML. Since the ATOM syntax is constrained to a standard specification, this is not too much of a pain.

Cliff mentioned in a recent blog post that he was using pylons and breve to develop a new blog. I hope he takes a look at brightcontent before spending a lot of effort.

serverpronto doesn’t understand privacy or security


http://serverpronto.infolink.com/esupport/index.php?_a=tickets&_m=viewmain
&emailre=bewest@gmail.com&ticketkeyre=d7f62b8b&_i=BAU-34900

I hope they eventually realize their ticket system isn’t secure, and that putting credit card numbers in tickets is a very bad idea.

UPDATE: I started hunting around for more information about this company. Evidently the Better Business Bureau can’t even figure out basic information about SeverPronto / InfoLink. The “address” they publish is simply a drop box, not a physical location. Buyer beware!