Annotation, Microformats & Syndication

Syndication feeds (RSS and Atom) provide precise information: the title of each entry, the author, when it was created, when it was modified, a unique identifier for the post, the content of the post without any surrounding menus, graphics, advertising, etc. This metadata supports many of the features of aggregators and blog search tools. But there’s a problem: the entries in a feed don’t last. Without a standard way to represent the information in HTML, it is lost to the Web. As far as I know, no such standard exists.

Why is this a problem? Well, for a start it means that if I suddenly add a large number of entries to my blog, Technorati, Feedster, PubSub et. al. will not index the older ones unless they are in the feed. Furthermore, if someone comes up with a cool new blog search technology or the like, much the data will simply not be out there to be indexed. (This also increases the first-mover advantage of the existing services, which have already indexed the no-longer-available data.)

Blog search tools are not the only services that could use the data. This is also a problem with my ideas for integrating wikis and forums. It even turns out to be relevant within an AJAX application, like the work I’m doing on web annotation.

My annotation implementation allows users to highlight text in a forum post and add notes in the margin, as one might underline text and write notes in a paper book. Each of these annotations is stored in a database, along with information about the annotated post, such as the post’s ID, title and author. I store these with the annotation so that they can be retrieved independently and treated consistently regardless of what was annotated; this makes it much easier to add annotation to other web applications. The Javascript that manages the display and editing of annotations also needs to locate the post on the web page so that it can show the highlights and margin notes.

This metadata (post title, author, etc.) is already present on the page. So I added CSS classes to the HTML to indicate what is a post, and for each post which tags contain the title, the author, the body text, etc. This is great: adding support for annotation to a page requires inclusion of the Javascript and a few minor tweaks to the HTML.

One question remains: what should these CSS classes be? Is there a standard out there for this sort of thing?

Well, there is Dublin Core. I could use dc:title as a class on the HTML element containing a post title. Of course HTML 4 doesn’t support namespaces, but at least for human beings who have worked with Dublin Core this is crystal clear, and unlikely to conflict with any classes in Moodle. But this is considered evil.

A more promising precedent is the use of microformats. The hReview microformat, for example, uses classes in a regular HTML document to specify a review of a movie, book, etc. These use simple, unqualified classes like “title”, “author”, etc. But I can’t find a microformat for solving the problem of syndication feeds. For the moment, I’m basing my classes on the names of tags in the Atom specification.


Annotation Status

I have made a number of improvements to the web annotation code during the month of May (follow the link for a current screenshot). Since these are increasingly integrated with Moodle, I do not intend to update the static example for the forseeable future.

Changes include a number of user interface improvements, such as more convenient buttons and automatic display of all annotations for a discussion. I experimented with a number of interfaces for creating new annotations, including a pop-up right-click menu and a shifting insertion caret, but settled on a clickable margin as the most practical user-friendly solution. Finally, I have added experimental syndication support, so that users can subscribe to a discussion and be notified when annotations are added or modified. This is only a trial: for it to be useful, it would also require the ability for users to view each others’ annotations.

Many of the most important changes are technical, and are not apparent in the interface. Though the annotation system is necessarily embedded in Moodle, I want to make it easy to take the code and use it elsewhere. To this end, I made a number of architectural changes. For a start, I moved to a REST architecture and adopted nicer URLs, both of which should make the system easier to understand and integrate. I have also switched to using Atom as an intermediate data format. There has been discussion on the Web lately about uses of RSS and Atom for representing metadata annotation seems like an ideal use. This has the added benefit that a whole class of aggregation software is capable of reading the annotation data. However, because the Atom specification has not yet been finalized, I have implemented RSS 2.0 support for experimentation in the interim.


Designing Nice Annotation URLs

While I’m working on web annotation for Moodle, I want the solution to be more broadly applicable. To that end, I have been working on making the service REST. A big part of that is developing sensible URLs, preferable ones that an ordinary human being can read and understand. I haven’t spent a lot of time designing URLs in the past, and I haven’t run across much in the way of good advice on the subject. One problem is that URLs shouldn’t change. This conflicts with the common Worse is Better practice of the web: first make it work, then make it work well.

URLs are used to refer to specific resources (usually web pages) in the system. Before choosing the URLs, it’s important to understand what those resources might be. In practice, users aren’t interested in individual annotations, but in lists of them. Here are some likely scenarios:

  1. a user retrieves all of his or her annotations
  2. someone lists all of the public annotations for a specific user
  3. a forum post is displayed with all associated annotations
  4. a user subscribes to a syndication feed of all public annotations in a particular discussion

These sets of annotations are all resources which deserve their own URLs. The simple approach might be to implement URLs like the following:

  1. /moodle/annotations
  2. /moodle/annotations?user=geof
  3. /moodle/annotations?forum=21&post=3
  4. /moodle/annotations.rss?forum=21

These are easy to implement, but they’re not very expressive. What if we want to annotate something that isn’t a forum post? We could end up with a huge number of confusing URL formats and query parameters, each of which requires its own implementation in the annotation service. To make annotations generic, the obvious thing to do is to associate each one with a URL. But then we end up with annotation URLs like this:


Actually, I lie. The URL would need to be encoded:


That’s nasty. Furthermore, query strings are best avoided: they’re hard to read, hard to remember, hard to predict, and they are tightly tied to the implementation. It would be cool if we could still use the URL of the annotated document, but without encoding it, like this:


This is clear and succinct. Even better, it can be extended. For example, to narrow down the list to public annotations by geof we could have a URL like this:


A more inclusive listing is easy to reference by removing part of the URL. For example, to list all annotations for posts in forum 21 the URL might look like:


I actually implemented most of this, but I unrolled it. The problem is the actual structure of Moodle URLs, like this one to forum thread 21:


I don’t see a clean way to work with URLs like this, so I reverted to the query string approach. I could have used query-free URLs internally and created mappings, but that is weird and rather presumptuous. It also requires apache mod_redirect directives to make eveyrthing work, and that could conflict with other Moodle design features. This is natural: annotations are mixins, not an independent feature. But messing with Moodle URLs is best left to the core developers.


Web Annotation Update

I have added a few new features to the web annotation project, and updated the demo accordingly:

  • You can now click on an annotation to edit it.
  • New annotations are created in the margin along with existing annotations, rather than in a separate edit box at the top of each message.
  • The Enter and Esc keys can be used to save or cancel changes while editing.
  • There is a preliminary version of a pop-up menu. It appears after selecting text, allowing you to create a new annotation.

Web Annotation Systems Compared

I ran across this article on “newsmashing; on Kottke, which talks about a number of browser-based plug-in annotation systems. I considered such tools when I was investigating annotation, but decided against them and instead started work on a Javscript server-based annotation system. I expect I will be asked why, so I’m recording the pros and cons of each approach.

  1. Installation. A plug-in must be installed on every computer using the system. Many users simply won’t bother; in institutional environments they may not even have the choice. A server-based system only needs a browser.
  2. Generality. A plug-in system can mark up almost any page on the Web. A server-based system like the one I’m developing simply can’t do that – it’s part and parcel of the site it is used to annotate.
  3. Consistency. A server-based system is integrated with the host web site. Its annotations are specific to content, not individual pages. This means the annotations will still be visible if the content appears in more than one context. A plug-in, on the other hand, can’t handle such situations. For example, a forum message could be displayed in a listing by date or in a list of search results. The server-based system can display the annotations consistently in both places, but the plug-in will only be able to display on one page – the one that was marked up – and not the other. The browser-based system may also be confused by changes to text on the page outside the annotated content (such as changed menus, other messages, etc.), while the server-based system much less sensitive to such changes.
  4. Integration and customization. A server-based system can take advantage of other features of the host site, like a common login framework; this could allow for on-site controls for viewing content by user or by group. The annotation systems can be customized to provide site-specific features, like the ability to search for annotations and show them in context, or to react intelligently when content is deleted or altered.
  5. Simplicity. Programming a server-based system is easier, hence cheaper.

In the long run, I would like to see standard browser-based annotation. I hope that the will evolve so that such a system can better support the features of server-based annotation (the problem is similar to what I deal with in my article on wiki-forum integration). But that’s a long way away. For now, the best approach is to implement something that’s useful today. If we’re lucky, it might help spur further development.


Web Annotation

I’ve started work on my web annotation project. I’m doing it under a grant from BC Campus as as an add-on to the Moodle course management system, but I don’t see why it shouldn’t be modularized so it can be used elsewhere. I have uploaded the core of my annotation code, such as it is, to the Code section of this site.

The web desperately needs annotation; I particularly noticed that annotation was mentioned at least separate times in the sessions I attended at Northern Voice. It’s an intertwingling technology that turns readers into participants. Yet, as I talked about before, I haven’t seen a good solution yet.

No, that’s not quite true. I have seen two annotation systems I really liked: Flickr and Flickr’s ability to associate an annotation with a hotspot in a photo is fantastic; can be thought of as a page-level annotation system. (Hey, wouldn’t it be cool to have a browser extension which would display annotations for pages you visit?) Unfortunately, neither of these deals with ranges of text.

Ideally, annotation would be a feature of the browser. Even then, though, many web sites have legitimate uses for annotation which go beyond what a generic solution can provide: they may want to cross-link annotations, or allow users to see certain annotations created by other users. The widespread use of annotation on individual sites could be what it takes to propel browser developers to add the feature.

I’m a firm believer in simplicity: the 80/20 point, Worse is Better, and all of that. So my toolbox is XMLHTTP, DOM, CSS, probably REST. But the tough part isn’t the coding: it’s the user interface. It needs to be virtually transparent to the user. As much as possible, I want to emulate the real-world annotation interface available to any student (or vandal) with a book: highlighted text and notes in the margin. My work is fairly primitive (I’m only working on it part-time), but I’ve uploaded it in the hopes of any suggestions or contributions. Go take a look at my example and see what you think.


Web Annotation

Recently I have been trying to figure out how to implement annotation for messages on an online bulletin board. Users should be able to highlight a passage of text, just as they can with a physical book. I haven’t found a good solution.

Annotation has been persued by other organizations, including a standard by the W3C and Brendan Eich‘s recent suggestion that annotion should be featured in future Mozilla browsers. There are also browser plug-ins. Most are per-page annotations or associate an annotation with a particular point in the text, not a text range. Others are IE-specific and require client installation, or require a separate annotion server. Even though the web is designed for the display of documents, and annotation is a common activity with paper documents, there seems to be no clean solution using standard web technologies.

The problem is that an HTML document is a hierarchy. For example, an article may consist of sections, which in turn break down into paragraphs inside which are emphasized elements of text. There is no way to mark a range of text which starts in one element and ends in another, and hence no way to highlight it.

There appear to be only two ways to achieve highlighting. The first is a hack: break up a highlighted section into multiple chunks and highlight each of them. Let’s take the following document:

<pre><p>Twas brillig and the slithy toves</p>
<p>Did gyre and and gimble in the wabe</p></pre

If we wish to highlight “the slithy toves Did gyre”, then the obvious way to do it would be like this:

<pre><p>Twas brillig and <em id=‘h1’ class=‘highlight’>the slithy toves</p>
<p>Did gyre</em> and gimble in the wabe</p></pre

Of course this isn’t valid HTML. Instead, we must break up the highlighting, like so:

<pre><p>Twas brillig and <em id=‘h1a’ class=‘highlight’>the slithy toves</em></p>
<p><em id=‘h1b’ class=‘highlight’>did gyre</em> and gimble in the wabe</p></pre

This is messy – determining where to place the highlighting is difficult and associating user actions with a highlight is complex (the application must deal with the fact that h1a and h1b are part of the same highlight block).

The only apparent alternative is to extend the broswer. Then data could be provided in a special non-HTML format (e.g. Javascript or custom embedded XML) to specify to the browser what needs to be highlighted. I don’t even want to imagine what would be required to make that work. Internet Explorer can likely do this right now, although I have been unable to find documentation for similar capabilities in Mozilla.

From a semantic point of view, highlighting often isn’t really part of the document, so the browser-extension solution does make a fair amount of sense. Most annotation work so far assumes this: annotations are owned by users, not by documents. But sometimes highlighting appears to be part of a document’s content. Perhaps a professor wishes to post a document with passages highlighted. We may say that such cases are degenerate (i.e. the professor owns the highlighting, not the document, otherwise the highlighting can be implemented as a series of smaller highlights as in the hack example). Even then, existing solutions are complex and inflexible. They make it difficult or impossible, for example, to include an annotation mechanism in a web application, or for developers to experiment or develop their own interface for collaborative annotation.

The ideal solution would make it possible to mark up a section of document in a way which doesn’t obey the HTML hierarchy (a problem is similar to the difficulty the W3C has had in coming up with a method to select columns for styling in a table), or to select a range of characters in CSS – something which CSS currently only supports with the ::first-letter and ::first-line pseudo-elements. Neither of these is likely to be standardized soon if ever. That leaves the browsers. Until then, we must hack.

If you can help me with more information – even if it has been a while since I posted this – please drop me a line: geof at geof dot net.

  1. Next