Annotation Release

I have uploaded a new version of web annotation for Moodle, with the following changes:

  • Users can now view all annotations for a single course. Previously, there was an option to view all (public or owned) annotations everywhere in the system. This was bad because in a big system it could produce an unwieldy but unhelpful list.
  • The database query for annotations is faster and more efficient, at the cost of adding a bit of redundant Moodle-specific data to the database. The previous version performed complex string comparisons to find out which annotations belonged to a post or discussion; this might have bogged down in systems with many annotations.
  • I completed localization support for the summary page.
  • The code includes untested support for teacher and author access modes (allowing a student, for example, to create an annotation that only a teacher can see). This feature is disabled because of the complexity it adds to the interface.
  • Annotation highlight display is faster. This was an update previously available in the stand-alone version.

The stand-alone version is unchanged.


Annotation Stand-Alone Fix

I have fixed a problem with the demo included with the stand-alone version of annotation. Database support wasn’t working due to an error in a path. This version also includes speeded-up display of annotations.

An upcoming Moodle version will allow users to view annotations and highlights created by others. Although the demo doesn’t show this capability, part of the mechanism is included in this release.


Annotation Fix for IE

I discovered that my most recent changes to the annotation code had broken support for Internet Explorer. I have fixed the demo and uploaded new versions with the problem resolved.


Annotation Update

I don’t plan such frequent updates, but important bugs are exceptions. I have uploaded another fix to web annotation for Moodle. This resolves an incompatibility between annotation and the Moodle forum rating system. It also fixes the width of annotation text entry box, which was previously much narrower than the margin.


Annotation Install Update

I found a couple of problems with the annotation install. These have now been fixed. Anyone who has downloaded the 20050816 release should download and install the 20050819 version instead.


Annotation Release

I have just released the 2005-08-16 version of annotation, both for Moodle and stand-alone. This release eliminates the dependency on Apache mod_rewrite and improves the documentation. It also includes the Internet Explorer support, smart copy, and searchable annotation summary from the previous release.

Only a few features remain on my wish list, among them localization support in Moodle, optimization, improved facilities for viewing other users’ annotations, and integration of annotation elsewhere in Moodle. I cannot promise any of these at the moment as there is other work I need to attend to. For now, I hope the documentation is helpful.

Of course I welcome any bug reports or other feedback.


More Annotating Exploder

Since I last wrote about annotation for Internet Explorer, I have solved many problems (including the crash). That’s because IE has many problems. For a time it seemed I had a difficult choice between breaking my application and breaking IE, but I believe I have found a reasonable compromise. My explanation will have to be technical.

Exploder in Space

IE, like other Microsoft tools (Visual Studio, Frontpage) mangles HTML. Here, for example, is a simple list I gave it (I have replaced the spaces with dots for clarity):


And here is what IE turns it into:


Pretty kinky, huh? IE did four things:

  1. It removed a close LI tag.
  2. It changed all the tags to upper case.
  3. It added several newlines.
  4. It removed several spaces.

The first two are pretty bad, but they don’t affect me. It’s the whitespace that’s the problem. I store highlights as character offsets from the start of a forum post. So if IE is adding or removing characters, highlights may be displayed in the wrong place on other browsers, just as annotations created on other browsers may look wrong on IE.

Can I finesse the problem?

Since I have determined IE’s behavior1, I could emulate it in other browsers by adding and deleting spaces accordingly. This would work, but it would mean that the annotation code would have to implement IE’s broken implementation forever, even if IE is fixed or retired. That’s a high price to pay.

I could just let things be — the problem is relatively rare. IE users may experience a problem caused by their choice of browser; frankly, that’s fine with me. But annotations created with IE will look wrong elsewhere, making other browser seem broken when they’re not. That’s bad, and could encourage Exploder use.

I could fix single-character errors by declaring that highlights must begin and end on word boundaries. I’m storing the highlighted text, so I could also use that plus the position to adjust for errors. This would solve most problems, since an element can at most introduce one character of error. But I’m counting the offset from the start of the forum post, and a post can have many elements, so the error could get quite large. What if I count from the start of child element instead, thereby reducing maximum error to one character?


XPointer, the W3C standard for specifying a location in an XML document, works like this. In fact, the W3C’s Annotea project uses XPointer to locate annotations. If I adopted XPointer, it would solve my problem and bring me closer to Annotea standards.

But there’s a reason I’m not using XPointer. The specification is very complex, so much so that adoption is limited and I don’t believe anyone has implemented the whole thing. Worse, my annotation is a special case.

For a start, I need to display annotations in order, both in the margin next to a post and in the annotation summary. XPointers aren’t ordered (actually, a subset of simple XPointers can be ordered, but IE messes that up by creating and deleting whitespace-only text nodes). Furthermore, highlights modify a document, effectively breaking any XPointers. I would have to map XPointers between an idealized highlight-free document and the actual highlighted one.

I think I made the right decision the first time. I want to be putting annotation in the hands of users, not pioneering complex standards.

A Hack that Would Work

Early on in this process I did have one other thought, but I discarded it because it’s too ugly. Since whitespace is causing all the trouble, I could perform all offset calculations without regard to whitespace. Just insist that highlights not begin or end on a space, and don’t count spaces. Then IE can do whatever perverse acrobatics it wants and it can’t hurt me anymore!

I do have an objection: who ever heard of a “non-whitespace character offset”? These offsets may be what I need to make my application work, but do they really mean anything? Long experience programming has taught me how risky it is to start playing around with representations that don’t correspond to anything in the world. These annotation offsets would be part of a format, and formats can long outlive code.

Actually, there is a measure which ignores spaces but is more meaningful and not specific to any browser: word count.

The question then becomes, “what’s a word”? Unicode can represent punctuation, dingbats, diacritics, and more; figuring out a convention universal to all the countless possible languages is beyond me. The obvious solution is to define a word as a sequence of non-whitespace characters. This definition is less than perfect, but I believe it’s far superior to a raw non-whitespace character offset (it’s also easier to read and debug).

Determining word breaks still isn’t straightforward. For example:


Paragraphs break words, so that’s three words. My system might refer to the word “million” as 2.0—2.7 (word 2, characters 0 through 7). But take this:

one<em>million</em> dollars

Emphasis doesn’t break words, so that’s two words; using the same scheme, “million” would be represented as 1.4-1.11.

It’s messy. But it solves the problem — completely and in all browsers. I think it’s the lesser of many evils.


1 I believe these are the rules for spacing: 1) Ignoring tags, retain only the first of multiple spaces. 2) Remove all leading spaces at the start of block-level elements. 3) Add a newline after the close of a block-level element.


Annotating Exploder

I have tried to find a way to make my web annotation work in Internet Explorer. I wrote this to illustrate how painful and expensive it is to support outdated or poorly-designed systems, such as Internet Explorer, and to solicit advice. I have tried to avoid technical details, but I hope I’ve included enough that this can also help programmers who may run in to some of the same problems I did.

When the user selects a passage of text, the major browsers all provide a way to find out what that text is. For annotation though, this isn’t enough: I also need to know where it is so that I can highlight the passage when a user creates an annotation. Five years ago, the W3C released the Range standard to address this. To my knowledge, this has only been adopted by the Mozilla family of browsers.

Microsoft, however, is famous for coming up with their own non-standard mechanisms for doing things, and text selection is no different. Internet Explorer does provide a way to find out more about what text the user has selected. It even provides the location — but it doesn’t indicate what paragraph or character in the document: instead it returns the pixel location of the selection on the screen. This is useless, because that location could vary according to font size, margins, window shape, etc., and there’s no way to convert from a pixel location into something more useful.

But it is possible to modify the text the user has selected. So, in theory, I could insert an invisible marker in the text, then search for it and — presto! — find out where it is in the document. There is a small difficulty, because there’s no way to insert a marker — I can only replace the entire passage of selected text. But that’s OK. I can find out what was selected, add the marker, and then replace the passage with the modified version.

But this is Microsoft. In typical fashion, they have tried to be smart only to end up stupid. When IE copies the text, it checks to see whether there’s anything missing. For example, there are markers that indicate where a paragraph ends and begins. If you copy the last part of a paragraph, the marker indicating the start of the paragraph will be missing, so IE helpfully adds it back. In other words, if you copy and paste the text — even without making changes — IE may insert paragraphs and mangle the web page.

It’s a Catch-22, like some sort of perversion of the Uncertainty Principle. I can know what text the user selected, or I can know where it is on the page, but I can’t know both!

I didn’t give up, though. If Microsoft was adding something to the selection, I could remove it again. I inserted hidden location numbers in the text before it was copied, which indicated where in the document each paragraph (or other element) started. When Microsoft copied the text, I could find the original location numbers, compare the copied versions with the originals, and remove any garbage added by IE.

And you know what? It worked! Now I could paste my text back in to the document, with my new marker added, search for it, and find the selected region. I tried it. And just about cried. Guess what Internet Explorer did?

It added markers.

If I copied the end of one paragraph and the start of another, it added a new blank paragraph in between. Again, Microsoft’s big brain was causing trouble. Only this time, the changes were in the middle of the text I was pasting, and I had no idea what the rule for adding them was. I gave up, but the problem stayed in my mind.

You have probably noticed the flaw in my reasoning. When I compared the hidden location numbers in the selection with the original document, I was figuring out the location of the selection in the original. So why didn’t I just use that location and forget about pasting in my hidden marker? The reason is an exception: there may not be any markers in the selected text. In that case, I need to paste something in in order to find the selection.

But there’s the the answer. Case 1: Internet Explorer only adds garbage if the selection crosses a paragraph boundary or the like. In this case, I can use the hidden location numbers and never paste anything. Case 2: The selection does not cross a paragraph boundary etc., so I can freely paste in my hidden marker and IE won’t mess things up.

As soon as I realized this, I hurried to make it work. It did! The problem was solved. But I realized there was one rare exception. Under certain conditions, if the document contains a lot of repetition, the solution for Case 2 might become confused and select the end of a paragraph when it should select the beginning. The user would select one passage of text, only to see the software move the selection to a different (although identical) passage. This is very unlikely, but it bothered me. I slept on it.

When I woke, I remembered something about Internet Explorer. In addition to providing the (useless) pixel location of the selected text, and allowing the selection to be replaced, it provides a way for the programmer to move the start and end points of the selection. So, for example, I could extend the selection by a word or a sentence. But what if I reduced it instead? What if I made the selection only one character long? Such a short selection could not cross any paragraph boundaries, so Internet Explorer wouldn’t add any gobbledygook. Then I could always safely apply the solution to Case 2. The end of the selection wouldn’t be a problem — I would just count how long the selection was before I changed it. And, best of all, this implementation would never suffer from the exception that afflicts Case 1.

So that’s what I did. It works. And it’s embarrassing, because after all of that, it really only takes nine lines of Javascript code to do something I thought couldn’t be done. I can produce the same location information that the W3C standard specifies.

The next step was to add IE support to my annotation code. I had already added some support; now I made it possible to create annotations. It works. But it also crashes, frequently, somewhat randomly, and with no apparent connection to my selection solution. It crashes so hard it offers to send a bug report to Microsoft and recommends rebooting Windows.

Now maybe there’s a bug in my code — not that that would be an excuse for Exploder, but at least I might be able to fix it1. But, unlike Firefox, Explorer provides virtually no facilities for debugging. So, for now, although I’ve struggled to avoid it, annotation is Firefox only.

The experience emphasizes three lessons:

  1. Focus is essential to programming, but it makes it hard to see if you’re on the wrong path.
  2. Writing about the problem, like talking about it, helps to gain perspective.
  3. Giving up can also help. Several times I said “it’s just not possible” and walked away. Each time, when I came back it was because I realized I had been too focused on a particular aspect on the solution and was ignoring something important.
  4. Microsoft’s selection API is very poorly designed. It seems to simply expose the information they need to provide the features they implement (copy, paste, keyboard selection), with no thought to how it might be used by anyone else.
  5. Speaking of Microsoft, don’t try to be smart until you can handle being simple.
  6. Supporting poorly-designed or obsolete systems is expensive — partly because it takes a long time, but even more so because the amount of time required is impossible to predict. Until the very end, there is no certainty the problem can even be solved. Even now, I don’t know whether my time was wasted or whether a solution will appear.


1 Apparently IE’s Javascript garbage collector is buggy. It’s highly likely, therefore, that I am leaking memory. Yet it seems unlikely this would be enough to make it crash as fast and as hard as it does.


Smart Copy & Blog Microformats

Participants in online forums often quote excerpts from each other’s posts. The problem is that a simple copy-paste operation loses the context of the quoted post. For example, a link back to the source post could be invaluable for both human readers and machine analysis and search facilities. As part of the web annotation project I’m working on, I needed to address this lack. I found a way to include such links automatically, so that normal copy-paste operations from a forum post include the title of the post, the author, the date, and a link. (You can see a working demo if you follow the above link.) But my solution is flawed. I want to explain why, and how work on microformats could resolve the problems.

As I said, I have a working implementation that automatically adds a link and other metadata whenever someone copies from a forum post. It doesn’t matter where they copy to: they can paste into a web form, an email message or a word processor; regardless, the link and title of the source post will be included.

The trick I used is Firefox-specific (although it may be possible to adapt it to Internet Explorer). Whenever a user selects text in a message, the web page silently inserts additional information into the selected text. This information is hidden from the user using CSS. When the hidden text is pasted, the CSS rules no longer apply and it becomes visible.

In other words, this is an ugly hack. And, as with all hacks, there’s a price to pay. If the start of the selection is styled—perhaps in italics—then the metadata will also be styled when it is pasted. (I won’t go into other downsides.) But I knew the interface had to be simple; otherwise, no-one would use the feature: after all, the great beneficiaries of this are the readers, not the authors.

I think the ideal solution is a browser extension. This could cleanly add context information to copy operations without mucking up the source page. The problem is, the browser doesn’t know what that context information is. It could figure out the title and URL of a web page, but if that page contains multiple posts then that information might as well be gibberish. Each forum (or blog or what-have-you) has its own format for these details.

We need a standard microformat. This would specify how to mark up blog and form posts in a standard way so that the browser can extract the relevant information (titles, dates, authors, URLs, etc.). A microformat wouldn’t require any additional Web infrastructure, and it’s very easy to adapt existing software to comply. If such a standard were widely adopted, this kind of copy-paste operation could become a standard feature of browsers and Web sites. Beyond copying and pasting, there are numerous other applications of microformats, most of which haven’t even been dreamed up yet.

For now though, work on a standard is just beginning, and I’m left with the ugly Javascript hack.


Annotation & Mandarin

I’ve been busy lately working on web annotation. I have produced a patch to add annotation to Moodle, and a stand-alone version suitable for integration into other web applications. Most recently, I’ve added a screencast of the system in action. I’ve also uploaded a simple Mandarin quiz web application which I use to teach myself Mandarin Chinese vocabulary. Feel free to try it.

  1. Previous
  2. Next