My Research on Reader Comments

I have uploaded my recently-completed Ph.D. thesis, Comment Space, which examines reader comment discussions on online news sites. I have also written a non-academic description of some of my analysis and findings.

Comments are important. A large proportion of Internet users read and write comments in response to news stories. Comment discussions are some of the few spaces where citizens with little in common take part in fierce arguments about political issues that affect us all. And studies have found that comments can influence readers: sometimes more than the articles they respond to.

I have developed techniques and technology for analyzing reader comment discussions to discover the sometimes unexpected things that are said, and to try to assess which arguments and points of view are most popular or have the most resonance. Commenters say things—sometimes important things—that journalists seldom do. I argue that some of the most widespread views about comment discussions, such as the idea that they should be like communities, are unhelpful.

Following my thesis research, I continue to examine comment discussions about important or interesting news topics. I have posted one such analysis. I intend to add more in the future.


The Internet is not Content

On Techdirt, Mike Masnik debunks a copyright maximalist who argues the Internet would be empty without the content industries. But in the discussion, someone suggests that what the point is that the Internet would be empty without content – that the Internet is, in his words, “digital paper.”


Conversation is not “content”, and the Internet is not “digital paper.”

Most culture is not “content.” Is a pick-up street hockey game “content”? Is a conversation with a neighbor over the backyard fence “content”? Is a romantic dance “content”?

For most of human history, human culture has not been “content”. Even today, most culture – human interaction and activity – is not content. It is a practice and a flow, not a thing. The fact that human communication online happens to leave a trace does not make it “content.” It’s “content-ness” is a side-effect. It is an epiphenomenon. It is not the thing – or rather the activity, the practice, the experience – itself.

Treating it as “digital paper” reduces practices to things. This is like reducing the journey to the map. Here I paused to admire the view, there I sat on a bench and ate my lunch, over there I watched a beautiful woman. You can draw a line to show my trajectory, but the essence of it, the point of it, the reason I turned this way and not that: all will be lost. (Credit to Michel de Certeau’s The Practice of Everyday Life for this example, and the next.) Or take a Chinese character. It appears to be a pattern on paper. But it is not just a shape in space. It is a movement in time. First I place my brush here, then I sweep there, I press, I lift and turn.

The Internet would no more be empty without content than would be a playground or a sports field or a sandy beach. The Internet would be empty without people. Digital paper. Hah.


Reader Comments

I want to apologize for all the people who ever took the time to comment on my blog. I just approved your comments – because I just discovered they existed. I somehow assumed my blog software would magically notify me when I received comments. Of course it never did. I figured either a) it’s really really hard to attract comments, or b) my software is broken somehow. So if you’re wondering why I’m such a jerk, I’m sorry.


Permission to Hate

The Internet and the technical community are host to a toxic culture. This culture allows and even encourages personal attacks, threats, and misogyny. This week, Kathy Sierra’s experience with death threats forced it into the public discourse. There is of course no excuse for the behavior of the individuals who harrassed and threatened her. Yet they are only part of the problem. The solution rests not in finding, stopping, and punishing them (or helping them, for surely they are sad or sick) – although that is to be hoped for, it may be unlikely here and certainly is for the majority of such cases. It rests with others who give permission to such behavior – permission to hate.

I encountered this story via Tim Bray and others, but I’m going to concentrate on Slashdot. I pick Slashdot because it is a technical community, because I often find the discussion valuable, and because I don’t frequent the other places online that I understand are far worse. So it is on Slashdot that I have encountered a pattern of public permission for hatred. On one singular topic the community consistently breaks down and reveals its ugly side; that topic is women. The comments about women on Slashdot, the reactions of readers, and indeed my own behavior (or lack thereof) illustrate what I believe are several flawed attitudes which grant permission for bad behavior.

That’s Just How It Is

The Slashdot reaction to the Kathy Sierra story captures the problem attitudes1. There’s an acceptance – even a satisfaction – that this is “just how the Internet is”. It’s a “byproduct of the culture of the Internet . . . this sort of thing happens. . . . let’s try not to make more of it than it is.” This narrative of powerlessness in the face of human nature or technology is present even among those posters who support Kathy. One such poster encourages her not to allow unpleasantness to stop her from blogging, yet repeats the same story:

While I respect anyone in the public limelight, I think Kathy is being a tad bit naive. . . . Part of being a celebrity on any level for any topic means accepting that you gain both fame and infamy in parts.

If we simply accept bad behavior as inevitable, then we will do little or nothing to prevent it. Whether this is part of an statement of support or a criticism (see “Grow a Spine or Go Away” below), the perpetrators are given implicit permisson for their actions.

The argument itself – that this kind of behavior is natural or inevitable – is demonstrably wrong. As several posters noted, and as I recall from my experiences online in the early 1990s, the degree of aggression used to be much less. It is possible to construct more civil online communities (never mind ones without death threats) – even anonymous ones. Furthermore, as I will detail when I argue about the practical implications, the Internet will change, and the reaction to this kind of abuse will influence whether that change is for the better.

Grow a Spine or Go Away

This is the most toxic attitude of the lot, perhaps best captured by one poster concludes the following:

People are dicks. Life is hard. A lot of people say a lot of shit and don’t follow through. Either grow a spine or go away. There’s no sense being a big baby about it because someone hates you.

The individual who wrote this reports having been threatened in the past. I believe this is key: coping with abuse thus becomes a sort of hazing ritual required of those who participate online. The measure of an individual is the ability to withstand the pressure; one who fails – and apparently taking action against the abuse is a form of failure – is a “baby”.

This appears to be a particularly masculine approach (though I’m sure there are women who take this attitude, just as many or most men do not). Buried within it is a sort of misogyny, for it measures everyone by their ability to live up to a standard of toughness. In practice, women may be less likely to achieve that standard (because they are targetted more, because they complain) or to be excluded because they chose not to participate in a hateful or aggressive environment2.

The argument cloaks itself in a kind of claim to objectivity – the standard is fair because it’s the same for everyone. Yet this is clearly a lie, for the effect is to exclude people, like Kathy, whose participation is valuable. A common follow-on argument is that the alternative is distasteful censorship3. In the case of death threats this should be irrelevant. It’s also a red herring for other destructive (but legal) speech: cultural norms can be just as or more effective. The argument rejects not only the censorship but any more moderate form of social influence.

Practical Considerations

I mentioned that online aggression excludes people. This is particularly relevant for women because they are targets of sexual language, and I understand of more frequent attacks in general. This is tremendously damaging to the technical community, as many within that community have been complaining for years. To give one simple illustration, many among the Slashdot community are ardent supporters of the Linux operating system: they would like to see Linux in general use. I can’t imagine this happening if half the population is alienated like this. The same applies to other technical, political, and social concerns – if the technical community wants to be listened to, it can not afford to abuse people in general or women in particular.

The assumption that bad behavior is a fact of online life has a further implication. Those who hold it exclude themselves from processes of technical and social change. The current state of the Internet strikes a particular balance between freedom of speech and civility, between anonymity and responsibility, and so on. It is obvious to me from Kathy’s case that this balance must change. It will change: legislatures are already banning schools and children from using social networking sites. A variety of proposals aim to curb spam by eliminating anonymity. Some of these have been criticized for centralizing power and granting control to certain powerful players. If the Internet doesn’t clean up its act, someone else will. Those who pretend nothing can change because “that’s just how it is” will have no part in influencing how that happens.

The Rest of Us

The barbarians on the wire are a small minority. Some of them may sad or sick and immune to social pressure, but I suspect the majority act as they do because the social environment of the Internet gives them permission to hate. The rest of us, when we are silent, grant that permission. Saying “no” is hard – it takes time, it takes effort, it’s hard to do well. It needs saying. Those like myself who haven’t said it before or enough need to say it more often4. It’s the right thing to do, and it’s the responsibility we need to take for our Internet and our society.


1 I want this to be about ideas, not an attack on individuals, so I’m not linking to specific comments. If you really need context, you can search for the comments in the article.

2 I myself have often chosen to “go away”. As a geek, I find this aggression particularly distasteful as I have been a target in the past. I hate to see my tribe inflicting its hurts on others. Unfortunately the technical culture has long shared a similar tendency to reject those who faill or choose not to cope with complexity or perversity. For example, when the complexity of certain software is criticized, there are those who reject any attempt to make it easier to use on the basis that smart people wil learn it, and the stupid or unworthy will keep away. Such aggression ghettoizes the community.

3 One thoughtful poster contrasted the need for political freedom with the prospect of censorship. By the terms of the argument, I believe it’s correct – but I don’t accept the binary choice s/he presents:

[The Internet has] ALWAYS been a war zone. . . . Anyone who thinks it used to be all nice and safe is either delusional or wasn’t paying attention. If you have a forum where governments can’t track down and kill political opponents, you have a forum where nice people can’t track down and hold liable nogoodniks who froth hate. That sucks for the nice people, but I think our need for widespread, anonymous communication outweighs their discomfort.

4 There are many issues I consider writing about. Only a few make it to the screen. It’s easy to think a thing; hard to put it into words I won’t regret. I doubt I’ll post much more about this topic, but I hope I in future that I will at least say something when it’s obvious something needs to be said.



Yesterday I attended Moosecamp at the Northern Voice blogging conference. Highlights for a couple of sessions follow (selected simply because these are the ones for which I have the most notes). I’m afraid I don’t know the names of most who spoke, so can’t credit them.

Edublogger Hootenanny

This was primarily a discussion about how students (elementary to post-secondary) could use blogs in education. The first theme emerged from the difference between forums, which are public spaces, and blogs, which are owned1. Scott Leslie asked how educators can give ownership when they need to set boundaries to protect children. The question then became whether it would be more productive to teach critical thinking and decision making rather than exerting control. Scott pointed out that this is a matter of control by parents who are often less net-literate than their children. Someone in the audience asked how schools could hope to protect students who are already online, to which D’Arcy Norman replied that the schools have legal liability.

The second theme was holism. When a student writes a blog for a class, the blog may expire with the end of the class. There’s no history. Worse, students may be required to maintain blogs for multiple classes. One teacher in the session had experienced students who would copy and paste between blogs in order to fulfill requirements. This leads back to the ownership issue: a blog is an owned space, an instrument of identity. When that is fragmented it time and space – in this case by following the standard model of education2 – the blogger will tend to lose heart.

Big Media Strikes Back: Bluffton Today and the Future of Print

This presentation by Ken Rickard was sparsely attended in comparison with the Windows Vista demo in the packed room next door3. Apparently Bluffton Today is a small newspaper (owned by a big media company) in North Carolina. At first I was concerned it was a marketing demo. In fact, it was very interesting.

The paper’s front page shows blog posts and photos from the community above the stories written by journalists. The idea is not new. It’s something I desperately want to see at my university4, in my neighborhood, in my building. What’s exciting is to see it finally happening and succeeding.

Ken showed what he said was a standard slide showing a bulls-eye. From the center outward, it was labeled Personal, Social, Local, and Global. He said: “Newspapers live on the outer rims. The audience lives in the center. The Internet connects the two.” He added, “actually, we’re not connecting the two, we’re transitioning.” Other words: “It’s not the process, it’s the product. The problem we have with newspapers is that they’re only concerned with the product.” I.e., the focus is on producing and publishing an edition, then moving on to the next.


1 A woman in a late session described adding blogs to an established community that already had forums, and found they were very different: the blog emphasized the person, not the topic.

2 What has been called a factory or assembly-line model; see John Gatto for a strong view on this.

3 I never upgraded from Windows 2000 to XP because I dislike and distrust Windows Activation, so Vista holds very little interest for me (the built-in DRM hardly enthuses me either).

4 From my experience, SFU provides virtually no facilities for online community to the university at large.



I just attended a session about microformats at Northern Voice. Following an explanation about what, there were a great deal of questions about why – and I don’t think they found the answers compelling. Let me explain why I am excited about microformats1.

This technology could enable more intelligent searches – e.g. a search for music reviews that actually found reviews, not just sites selling the CD. They could make it possible to click on an event (like Northern Voice) on a web page and have it added to your calendar automatically, to copy a full address and phone number to your desktop address book, to forward a forum posting to an email or wiki with one or two mouse clicks2.

The technology to do all of these things is straightforward. What’s needed is standards. There must be a simple, common method for indicating on a web page what’s a review, what’s a calendar event, what’s an address. With that information, it’s not very hard to add these kinds of capabilities to web browsers, search engines, or email clients.

I imagine that my annotation system, Marginalia, would not need to be integrated into a web service like Moodle. It could be used to add highlighting and margin notes to individual blog or forum posts, regardless of what site those posts are on. All that would be needed is a link on your browser’s bookmark bar to activate annotation. Or, something like my smartcopy feature could be enable automatically. Whenever you copy a quote from a blog or forum post, the title, author, and URL of that post would be included automatically.

In fact, has standards for many of these things today. We have the technology, we have the standards, we just need people to use them.

As the group at Northern Voice recognized, that’s the key problem. It’s very difficult to persuade web page and blog authors to include the extra bit of structure needed to make microformats work. They don’t see what’s in it for them. The consensus was that the tools need to help out. I hope my examples above show that there is a benefit to authors; that once the technology gains a foothold the benefits of making that little extra effort will make it compelling.

My examples should also confirm that there is an important role for tool makers today. Much of this microformat structure information is already present in the tools. There’s no need for authors to specify the title, author, and date of blog or forum posts: the information is already there. It’s just not structured in a standard way.

While I think my examples are valid, I don’t think they touch on the real benefit of microformats. My predictions are like the advertising for the early personal computers, which claimed the machine would help organize recipes. Like personal computers and the web before, we won’t know the power of the technology until people start playing with it and having ideas. How many people thought the hyperlink would be so powerful, the basis for accurate searches, challenges to hierarchy, and citizen journalism?


1 Here’s a brief explanation of microformat technology. Web pages are already structured: they have titles, paragraphs, lists, links, and so on. This information is useful – for example, when you create a bookmark, your browser uses the title of the web page as the bookmark text. When you click on a link, the browser knows where to take you. Search engines follow links in order to figure out how popular pages are. But the structure of standard web pages can only take you so far. For example, there’s no way of indicating that a web page contains a movie review, or the title of the movie being reviewed. Similarly, there’s no way to indicate a calendar event or a blog post. Microformats are a way of adding this kind of structure to a page. This is semantic web technology, and there are other standards that could be applied. The advantage of microformats is that they are simple: they require nothing more than HTML, i.e. no no changes to the existing technological infrastructure.

2 What these examples have in common is that they give power and choice to people, rather than to central organizations. (There is no guarantee that this will remain the case; people may still choose to develop or use centralized applications. But at least there is no requirement to do this.)


Don't Charge for Email

Tim Bray has proposed a micropayment system to combat spam. A central authority – he suggests the post office – would issue “stamps” for perhaps $0.01. Then, each email message, blog post, etc. would have to prove that it had been paid for by a stamp. This low cost, he suggests, would have little effect on most people, but would be fatal to spammers. I am an admirer of Tim’s blog, but I think he’s dead wrong: this is one of the worst ideas I’ve seen in some time1. There are two main problems, revolving around cost & access, and centralization & innovation.

Cost & Access

A penny for an email message may not sound like much for many people. But for the poor it could effectively ration Internet communication. A homeless job seeker might be able to afford the payment to send out a batch of resumes, but the cost is one more reason not to. In the third world, a penny an email could quickly start to look like real money. Tim says the cost is negligible; he, for example, wouldn’t send more than 100 messages a day. I beg to differ. Thirty dollars a month is the cost of a cell phone or of broadband Internet.

Furthermore, it would be extremely tempting for the agency issuing the stamps to raise the price to whatever the market would bear. The lobbying might be hard to resist: why not let the free market decide and charge what the service is worth? Markets are supposed to be good at allocating scarce resources. But stamps aren’t scarce: any profit obtained from selling stamps is the product of an artificial monopoly – a monopoly which could be expected to raise the price at every opportunity. Is there any sufficient argument for arbitrarily limiting email access based on wealth2?

Centralization & Innovation

There are two forms of centralization here. The first is the central authority which issues the stamps. It becomes a critical gateway for access to Internet communication. Such power is immediately ripe for abuse, e.g. by lobbying politicians for an increase in the stamp rate.

But there is a second, invisible form of centralization here. Software for sending, routing, and delivering messages now has to include the ability to create, pass on, and authenticate stamps. It must do this in conjunction with the central authority (which therefore has the ability to decide who can authenticate and who can’t). Worse, it complicates the software. As with most complexity, this insulates incumbents from challengers. Authors of Internet communication software would need greater expertise in order to deal with the stamp system, and added resources in order to implement support for it. This added cost might mean little to Microsoft, but for the lone hacker it might be enough to stop new software from being developed. This would likely fall especially hard on the open source community.

The added complexity likely also has compound and unexpected interactions with other parts of the Internet communication infrastructure. New uses for the technologies might never see the light of day because of conflicts with the stamp system.

Meanwhile, the cost of stamps (or even just the perceived cost) would affect behavior: users would start finding ways to minimize stamps, e.g. by using blogs or wikis instead of email, even where they are less appropriate3.

As I said, it’s a bad idea. I can’t imagine it will come to pass. My real concern is that we in the software community too often forget the burdens imposed by the costs and complexity of our technologies. As for splogs (spam logs), I suspect the long term solution is identity, which while it solves many more problems, shares many of the disadvantages of the stamp scheme.


1 Actually, I’ve seen the idea of small charges to email messages proposed numerous times as a way to combat spam. I have even thought it was a good idea. I was wrong too.

2 A thought-experiment bears this out. What if the market were more democratic. Say every person was assigned 100 stamps per day for free. This would preserve access for the poor. But it is immediately obvious that some people would have a legitimate need for more stamps.

3 Updated (minutes later): Of course Tim is proposing that only end-user recipients would reject unstamped communications. So what I had written doesn’t apply: “Inevitably, I suspect, an independent communication system would arise – one not reliant on the stamp infrastructure. At this point, only draconian legislation or arbitrary control by ISPs – limiting technologies and their uses – could save the system.” Of course, ISPs might start requiring stamps in which case a parallel infrastructure is entirely possible.


Annotation, Microformats & Syndication

Syndication feeds (RSS and Atom) provide precise information: the title of each entry, the author, when it was created, when it was modified, a unique identifier for the post, the content of the post without any surrounding menus, graphics, advertising, etc. This metadata supports many of the features of aggregators and blog search tools. But there’s a problem: the entries in a feed don’t last. Without a standard way to represent the information in HTML, it is lost to the Web. As far as I know, no such standard exists.

Why is this a problem? Well, for a start it means that if I suddenly add a large number of entries to my blog, Technorati, Feedster, PubSub et. al. will not index the older ones unless they are in the feed. Furthermore, if someone comes up with a cool new blog search technology or the like, much the data will simply not be out there to be indexed. (This also increases the first-mover advantage of the existing services, which have already indexed the no-longer-available data.)

Blog search tools are not the only services that could use the data. This is also a problem with my ideas for integrating wikis and forums. It even turns out to be relevant within an AJAX application, like the work I’m doing on web annotation.

My annotation implementation allows users to highlight text in a forum post and add notes in the margin, as one might underline text and write notes in a paper book. Each of these annotations is stored in a database, along with information about the annotated post, such as the post’s ID, title and author. I store these with the annotation so that they can be retrieved independently and treated consistently regardless of what was annotated; this makes it much easier to add annotation to other web applications. The Javascript that manages the display and editing of annotations also needs to locate the post on the web page so that it can show the highlights and margin notes.

This metadata (post title, author, etc.) is already present on the page. So I added CSS classes to the HTML to indicate what is a post, and for each post which tags contain the title, the author, the body text, etc. This is great: adding support for annotation to a page requires inclusion of the Javascript and a few minor tweaks to the HTML.

One question remains: what should these CSS classes be? Is there a standard out there for this sort of thing?

Well, there is Dublin Core. I could use dc:title as a class on the HTML element containing a post title. Of course HTML 4 doesn’t support namespaces, but at least for human beings who have worked with Dublin Core this is crystal clear, and unlikely to conflict with any classes in Moodle. But this is considered evil.

A more promising precedent is the use of microformats. The hReview microformat, for example, uses classes in a regular HTML document to specify a review of a movie, book, etc. These use simple, unqualified classes like “title”, “author”, etc. But I can’t find a microformat for solving the problem of syndication feeds. For the moment, I’m basing my classes on the names of tags in the Atom specification.


Threaded Blog Aggregation

Proponents like to say that blogs are conversations. And they are: many have comments; many posts link to others and those links are discoverable through blog search engines. I want to see threaded blog-reading software. Because compared to the conversations going on in email, or in discussion forums going back to the heyday of Usenet or Fidonet and BBSes in the 1980s, blogs hardly qualify.

It shouldn’t be that hard. A syndication feed – whether RSS or Atom – provides all the information needed to build a view of a discussion view: a post’s author, the subject, links to what this post is replying to, when it was written, folksonomy categories. That’s everything needed to build a threaded view of a discussion. This is made a little more complicated because a post can reply to more than one other, but that means there’s too much information, not too little. With cleverness, this could be used to construct a multi-dimensional view of a discussion.

Jon Andersen suggested to me the possibility of mixing in other messages, such as emails. This is not public information, but from the point of view of the recipient it can be just as much a part of a convesation or relationship as blog posts and comments.

I see one hitch, something that I think needs to be resolved for any number of uses of syndication technology. Feeds expire – they only include the most recent posts. Right now, only the big search engines like Technorati and Feedster have a history of expired blog entries. It would be better if a site were capable of generating feed metadata for individual pages, no matter how old they might be, so that permalinks from other posts could be resolved. As it happens, this is similar to a problem I ran into in my web annotation project, so I may talk about it soon.


Newspapers Can Benefit from Blogs

The New York Times announced today that they intend to start charging for online content. In my recent study of blogs, I found that linking to mainstream media stories was extremely common; quoting them even more so. If these results are representative, then it seems to me that blogs are serving as a new distribution channel for the media. If that’s true, then the Times is doing exactly the wrong thing.

The numbers shocked me. I found that the most common source of block quotes in my sample was mainstream media – by a long shot. Fully 63% with relevant quotes excerpted mainstream media stories. The media were also the main target of links: of those posts with relevant links, 53% linked to mainstream media stories. Links to other blogs were a distant second; the comparable numbers were 20% and 27% respectively.

These posts are not competing with the media: they are extending it. There are millions of bloggers out there; it seems likely that many of them are passing on what the media report and giving credit back to the journalists. This is free advertising for a business that’s losing its readers. Discouraging the bloggers by asking them to pay is exactly the wrong way to respond.

Many papers allow free access to new stories, then charge for access to older ones. The unique advantage of the news media is currency: with journalists in the field, they can be first with the news – hence the word, “news”. It makes no sense for them to monetize a secondary asset – their archives – at the expense of their main line of business. (In a digital economy, it may also be unwise to outsource their reporting to wire services and turning themselves into middlemen.)

But even charging for the new stories wouldn’t make sense. Newspapers make money in two ways: first by selling newspapers, but more importantly by selling advertising. Circulation is king. By charging for access, they are trading one business model for another. That’s a tremendous risk in a world where those very blogs that could enhance their influence could also constitute an alternative.

Good newspapers like the Times can be valuable members of the online community; or, they can cut themselves off and hope they’re powerful enough to go it alone. I care because that would be bad for the community. They should care because it would be even worse for them.

I see two points against my argument. First, some people don’t believe blogs are so significant. I think they are. They are one of the richest sources of links, which are the currency of the Web. Second, subscriptions could outperform advertising. I doubt this, but I would be thrilled if it were true: it would realign the interests of newspapers with their readers, rather than their current corporate customers.

  1. Previous