Networked Folksonomies

I suspect that with thought, other information can be leveraged to remedy many of the weaknesses with folksonomies. Let me explain what I mean.

A folksonomy is a non-hierarchical ad-hoc classification system, contrasted with many taxonomies, which are hierarchical and planned. The success of the use of tags in and Flickr has has made them a topic of interest; over the past week there has been quite a bit of talk about the subject.

Louis Rosenfeld describes some of the problems with folksonomies. Among these problems are ambiguity (the same tag can be used for multiple meanings, for example “metal” could mean a construction material or a kind of music) and diversity (an online video could be tagged as a “film”, “movie”, or “video”).

Ross Mayfield responds that despite their flaws, folksonomies are in many cases superior to more organized systems of classification. Their simplicity makes them feasible where enforcing a more structured taxonomy is unrealistic. I agree: my tags are inconsistent as it is; I can’t imagine trying to apply a more rigorous system. Even if I were to succeed, constructing a query for such a system would likely be nearly as challenging.

In many cases, it is probable that the problems of ambiguity and diversity can be remedied by using other information. Look at del.icious. Tags are only one item of information about a link: there is also the link itself, the associated user, the page title, the date, and any other tags associated with the item. (Going further, there is also the content of the web page itself, and all the information Google has available.)

Obviously, multiple tags on an item can help to disambiguate it: “vancouver us” is likely different from “vancouver canada”. Furthermore, users are probably fairly consistent: if I use “vancouver” to refer to Vancouver, B.C. In one place, I likely mean the same city elsewhere. Text in the title can help too.

On their own, these clues can increase the value of tags slightly. But is has far more information: it knows who else has used the same tag, and who else has used the same link. Louis points out that this effectively turns into a thesaurus: tags can be statistically correlated by whether they are applied by multiple users to the same link. As the data set on grows, this thesaurus function becomes increasingly accurate.

Ambiguity is similarly vulnerable to attack. For example, if two users tag the same link with the same tag, it is likely that they mean the same thing. Remember, we can obtain hints of that meaning elsewhere – e.g. from link titles, other tags, etc. Then, because users tend to be consistent, we can extend this to predict the meaning of that tag in other circumstances.

If we throw more information into the mix – e.g. Google’s full-text search – all of a sudden these folksonomies become remarkably precise. It is possible to click on a tag and tell, “give me all other links with this meaning”, or even “give me all other links with related meanings”. We might go so far as to ask for “all links about Vancouver B.C. and nearby cities”.

With networks, what we lose in accuracy, we can make up in volume.