Wikis and Middle Age

Today I noticed that two of the places I visit frequently – Creative Commons and – have new wikis. Blogs are taking over from personal homepages1. This is the time to turn the Web from a collection of sites into something bigger.

A technology experiences the most experimentation when it is young. As it grows older and gains acceptance, it tends to settle into familiar forms. Sitcoms are 30 minutes, dramas an hour long. Credits are at the end of a movie (they used to be at the beginning). Cars have steering wheels instead of reins. Windows are dragged by the title bar, not by using a menu or holding a control key.

There is still tremendous experimentation, with AJAX applications and the great examples of Flickr and Google Maps. It will take a long time for this to settle down. But I’m not talking about fancy web services. For regular tasks, the kinds of content-based sites the Web was originally designed for, we are finding the sweet spots. Most sites now follow standard practices: wikis have a search field and an index, blogs have archives by date and topic, e-commerce sites have shopping carts, forums have lists of threads.

I am one of those who thinks one of the greatest advantages of the Web over desktop applications with “rich” interfaces is the simple user interface vocabulary; I am pleased I haven’t seen a Javascript drop-down menu in ages. The simplicity of the tools forces us to think harder about what we want a site to do, rather than inventing complex new ways to do it2.

The emerging standard formats – wikis, forums, blogs – look set to stay. But there’s still plenty of room for innovation: that it has taken this long to get this far only shows how much more might be done. I think one of the biggest weaknesses is integration. We can depend on these technologies sticking around; it’s time for them to start talking to each other3. That will increase the value of all of them, and increase the ability of all of us to build on each others’ work. And that, as the mashers and wikipedians and bloggers have shown, is what it’s all about.


1 Searching Google, I find 540M hits for homepage and 204M hits for blog. The word “blog” is newer, and not all homepages are personal sites. See also Sifry.

2 Some applications need more, especially for authoring. I’m violating the principle myself with web annotation.

3 The pending release of the Atom spec is a step in the right direction.


Wiki-Forum Integration

I would be surprised if what I’m proposing hasn’t been done, but I haven’t found it anywhere. Web forums are often full of useful information, but it isn’t organized. It would be great if it was easy to pull the best posts out of the forum and plug them into a more organized repository, like a wiki. Here’s what I envision.

Every forum post has a button on it. Clicking that button creates a new wiki page, copies the content of the post to the page, and creates links between the two of them. The link on the wiki page allows readers to place the item in context with the discussion which spawned it; the link on the forum post allows any future readers to find more information about the topic.

It is critically important that a) this is an activity performed by readers, and b) it must be simple enough that it doesn’t break the flow of ideas in a good discussion.

So there would probably be two buttons on a forum post: a quick and easy button to copy the post to a generic holding area in the wiki which would later be integrated (the “one-click shopping” approach), and a different button for copying to the wiki and dealing with integration now.

What made be think of all of this is a discussion on about creating a new horror mythos. Some of the posts are brilliant, and I’d like to pull the good ones together. Even better, I would like to share the results and benefit from whatever anyone else might add. I’m sure there’s a wiki somewhere with such an effort. The thing is, the content already exists in forums. All that’s needed in a wiki, a default Creative Commons license for each post, and a quick and easy way to integrate the two.

Forum-wiki integration would be very useful elsewhere. One person suggests something similar for creating FAQs. I imagine education discussion forums, as on Moodle and Sakai, could benefit equally, as could any number of other areas. So many domains would benefit from their own mini Wikipedia. The keys to success on the Web are content and links. We already have the content.

As for links, ideally it would be possible to copy to any number of other wikis hosted elsewhere, but there’s no such thing as a standard export format for forum posts. The first idea that comes to my mind is to allow each post to generate a mini chunk of RSS or Atom, which is then sent to a wiki supporting a standard REST protocol. But for now, a straightforward custom wiki-forum integration is the place to start.


Spam Markets

If we want to eliminate spam, we should attack the market, not the medium. Andy Lester argues that content filtering is a counterproductive failure which diverts efforts from a real solution. I think he is mistaken.

Spammers are in business to make money. If we can reduce their profitability, we will reduce spam. Content filtering forces spammers to obfuscate their messages and to discard virtually all of the design tools used by advertisers to influence their audience. This surely reduces their market, and a smaller market will have fewer spammers.

Similarly, laws – even if not global – are not useless. If spammers must go overseas, that costs them money; it may also make filtering more effective. If their suppliers are liable, or the credit card companies, their supply is endangered. And if spammers must pay fines or hire lawyers, that costs them also, as does a decreased supply of spam-friendly ISPs.

These things work together to make spam uneconomical (increased security on PCs is certainly part of the solution). Even together, they may not be enough to eliminate it. But we haven’t even tried: our laws are feeble or non-existent.

The alternative, the approach that can wipe spam from the face of the planet, may require that we discard the potential for anonymity and hand security and trust to a central authority. There may be other solutions, but those who solve the problem may have a lot to gain from taking control. For the rest of us, it is a high price to pay.

We don’t have to eliminate spam. Two spam messages per day are qualitatively different from two hundred per day. If we can achieve that without rewriting or sacrificing decentralized control of the network, we will have won.


No Politics in nofollow

Google‘s new nofollow attribute for links will hopefully reduce comment and trackback spam on blogs. Defeating spam is essential. But reading Tim Bray‘s blog, I see that there may be a cost.

The goal of nofollow is to reduce spam. It does that by excluding links from search engine ranking calculations. Google suggests that this be used by blogging software to flag links for trackbacks and in comments on blogs: a worthwhile trade-off if it saves us from spam. However, some people my use nofollow on links pointing to sites with which they disagree.

This is a dangerous abuse of the tool, for it attacks the connections that make the Web valuable. Furthermore, it encourages one of the other threats to the Web and the blogosphere: parochialism. It becomes easier and easier to converse only with people of similar interests and opinions. The links that cross-cut interests and convictions are valuable because they tie us into a larger public conversation.

Even distasteful opinions should be exposed to the world. Take a recent column on the Ayn Rand Institute site, which included claims like the following: “The United States government, however, should not give any money to help the tsunami victims. Why? Because the money is not the government’s to give.” The Institute has since retracted the original article. The fact that we can deplore the removal of the original text makes it clear how important it is for this kind of material to be accessible.

A link created by the author of a page flags its target as relevant to the page. The problem with spam is that it is irrelevant: it is all noise, no signal. Nofollow is an appropriate way to filter out the noise; it should not be used to attack the signal. The vote-links proposal offers a better way to deal with disagreement. Just as the blogosphere deplores posts that are silently updated, we should expect that nofollow never be applied to relevant links deliberately created by the author of a page.

In Tim Bray’s case, I am of two minds: spammers so flagrantly violate the social contract that they deserve no consideration. But the whole point of this exercise is that the Web is more important than the spammers. Making such fine distinctions sets a risky precedent.


Networked Folksonomies

I suspect that with thought, other information can be leveraged to remedy many of the weaknesses with folksonomies. Let me explain what I mean.

A folksonomy is a non-hierarchical ad-hoc classification system, contrasted with many taxonomies, which are hierarchical and planned. The success of the use of tags in and Flickr has has made them a topic of interest; over the past week there has been quite a bit of talk about the subject.

Louis Rosenfeld describes some of the problems with folksonomies. Among these problems are ambiguity (the same tag can be used for multiple meanings, for example “metal” could mean a construction material or a kind of music) and diversity (an online video could be tagged as a “film”, “movie”, or “video”).

Ross Mayfield responds that despite their flaws, folksonomies are in many cases superior to more organized systems of classification. Their simplicity makes them feasible where enforcing a more structured taxonomy is unrealistic. I agree: my tags are inconsistent as it is; I can’t imagine trying to apply a more rigorous system. Even if I were to succeed, constructing a query for such a system would likely be nearly as challenging.

In many cases, it is probable that the problems of ambiguity and diversity can be remedied by using other information. Look at del.icious. Tags are only one item of information about a link: there is also the link itself, the associated user, the page title, the date, and any other tags associated with the item. (Going further, there is also the content of the web page itself, and all the information Google has available.)

Obviously, multiple tags on an item can help to disambiguate it: “vancouver us” is likely different from “vancouver canada”. Furthermore, users are probably fairly consistent: if I use “vancouver” to refer to Vancouver, B.C. In one place, I likely mean the same city elsewhere. Text in the title can help too.

On their own, these clues can increase the value of tags slightly. But is has far more information: it knows who else has used the same tag, and who else has used the same link. Louis points out that this effectively turns into a thesaurus: tags can be statistically correlated by whether they are applied by multiple users to the same link. As the data set on grows, this thesaurus function becomes increasingly accurate.

Ambiguity is similarly vulnerable to attack. For example, if two users tag the same link with the same tag, it is likely that they mean the same thing. Remember, we can obtain hints of that meaning elsewhere – e.g. from link titles, other tags, etc. Then, because users tend to be consistent, we can extend this to predict the meaning of that tag in other circumstances.

If we throw more information into the mix – e.g. Google’s full-text search – all of a sudden these folksonomies become remarkably precise. It is possible to click on a tag and tell, “give me all other links with this meaning”, or even “give me all other links with related meanings”. We might go so far as to ask for “all links about Vancouver B.C. and nearby cities”.

With networks, what we lose in accuracy, we can make up in volume.


How to Use Rel to Free Culture

I’ve been reading Lawrence Lessig’s excellent Free Culture. I first worried about copyright nearly ten years ago, and I have become increasingly disturbed. His vision of a soviet future of controlled culture terrifies me. There should be marches in the streets and activists on the steps of the VAG. But there aren’t: few understand how important this is. There must be a way for us to use our strengths to our advantage. Can we leverage the net and everyone on it who cares in a way that’s unambiguous to our politicians? I have an idea, but there’s a piece missing. Perhaps someone can help me1.

Recently a group of hackers invented XFN, a beautifully simple way of representing relationships between people on the web2. The same technique is also used to link to the Creative Commons license. Their site talks about implementing the ability to search for works according to license3. But these links aren’t just useful for people searching for art – they are also a political statement, and we must use them as such.

But why not use the same approach to link to political beliefs? If you cared about a belief, you would create a hyperlink to a site representing that belief and set the rel attribute of the link (I suggest rel=”ideal”) to indicate that this is something you support. Search engines could then use this information to build a map of public opinion, something politicians would love. Regular people could spontaneously construct decentralized networks of interest. Unlike a petition or newsgroup, this approach carries with it all the advantages of decentralization: there is no need for planning of any kind, and the system is resilient.

The problem is authentication. If such a scheme existed, how could we prevent a group from essentially spamming the net with links to a particular cause? This I don’t know. Maybe spam-filtering techniques will eventually be up to the task. Maybe we use IP addresses or email addresses (except we can’t due to spam) to make it more difficult to fake a cause. Is there some unrelated standardized key?


1 Update 2004-04-26: I just discovered there is already a standard for this called Vote Links. There are several discussions about it, including one at Many-to-Many.

2 FOAF already offered what XFN does, and more, but I think XFN hits that 80/20 point of simplicity. All it requires is that the rel attribute of an html hyperlink be set with one of several values, for example <a href=”http:/” rel=”friend”>geof</a>.

3 They focus on RDF and regular links which I think is a mistake – the rel=”license” method is trivial to implement, and because it’s generic it could benefit from wider adoption (for example, it can also be used to link to the GPL).

  1. Next