Smart Copy & Blog Microformats

Participants in online forums often quote excerpts from each other’s posts. The problem is that a simple copy-paste operation loses the context of the quoted post. For example, a link back to the source post could be invaluable for both human readers and machine analysis and search facilities. As part of the web annotation project I’m working on, I needed to address this lack. I found a way to include such links automatically, so that normal copy-paste operations from a forum post include the title of the post, the author, the date, and a link. (You can see a working demo if you follow the above link.) But my solution is flawed. I want to explain why, and how work on microformats could resolve the problems.

As I said, I have a working implementation that automatically adds a link and other metadata whenever someone copies from a forum post. It doesn’t matter where they copy to: they can paste into a web form, an email message or a word processor; regardless, the link and title of the source post will be included.

The trick I used is Firefox-specific (although it may be possible to adapt it to Internet Explorer). Whenever a user selects text in a message, the web page silently inserts additional information into the selected text. This information is hidden from the user using CSS. When the hidden text is pasted, the CSS rules no longer apply and it becomes visible.

In other words, this is an ugly hack. And, as with all hacks, there’s a price to pay. If the start of the selection is styled—perhaps in italics—then the metadata will also be styled when it is pasted. (I won’t go into other downsides.) But I knew the interface had to be simple; otherwise, no-one would use the feature: after all, the great beneficiaries of this are the readers, not the authors.

I think the ideal solution is a browser extension. This could cleanly add context information to copy operations without mucking up the source page. The problem is, the browser doesn’t know what that context information is. It could figure out the title and URL of a web page, but if that page contains multiple posts then that information might as well be gibberish. Each forum (or blog or what-have-you) has its own format for these details.

We need a standard microformat. This would specify how to mark up blog and form posts in a standard way so that the browser can extract the relevant information (titles, dates, authors, URLs, etc.). A microformat wouldn’t require any additional Web infrastructure, and it’s very easy to adapt existing software to comply. If such a standard were widely adopted, this kind of copy-paste operation could become a standard feature of browsers and Web sites. Beyond copying and pasting, there are numerous other applications of microformats, most of which haven’t even been dreamed up yet.

For now though, work on a standard is just beginning, and I’m left with the ugly Javascript hack.