A simple thought experiment in weblog semantics
1:37 PM – Joe makes a post on his blog about his golf vacation on Prince Edward Island
3:24 PM – Joe makes a post on his blog about hang gliding in Newfoundland
Two weeks pass.
Sam searches Google for hang gliding in Prince Edward Island. Joe’s blog is the first result.
The trouble with this picture is that Joe never wrote anything about hang gliding on Prince Edward Island. The thought never even crossed his mind.
This is a fundamental problem with searching by keywords – the page is not necessarily the finest unit of web content. Often, especially on weblogs, any given page will have dozens of completely independent posts – made by different authors, on different days, about different topics. Google has no way to tell one post from another and can only link to general archive pages rather than to individual posts.
With all the talk about separating design from content and encoding semantics, it occurred to me that this current level of separation isn’t particularly useful to most of us. It can help with accessibility, which is important; but the cruel fact is, that for truly semantic code to become universal, people are going to have to see concrete results (ie. cool stuff happening).
BlogML (weblog markup language) doesn’t exist yet (as far as I know), but I think could save us.
The average weblog has a relatively simple set of fields for each post: title, author, date/time, permanent URL, # of replies, URL of replies, and main content (I’m sure I’m missing some, but you get the idea).
If we could somehow code our weblogs with this structure, Google and other services would be able to see the content as it really is: a loose collection of independent posts. When Google indexes weblog archives, search results could include individual blog posts with the appropriate links (rather than linking to an archive page with 30 posts).
This is not a new idea – it’s the semantic web and it’s been coming for a while. Here’s the key: The weblog community is in the unique position situation to affect significant change through nimble and collective action.
Think about it – if some kind of markup could be defined for this, it wouldn’t take years to be adopted (like most standards). Rather, it would take the cooperation of a few key weblog players. If Blogger, Moveable Type, GreyMatter, and the UserLand suite all started pumping out BlogML-enhanced HTML, it would be instant critical mass. The majority of weblogs would be on board in a matter of days and it wouldn’t take long for the rest of us to jump on the bandwagon. Search tools like Google, BlogDex, and DayPop would be able to offer better search results to their customers.
Webloggers are in a unique position to take collective leaps and bounds towards the semantic web.
Ok, so how do you actually do this? What is BlogML? What does it look like? I’m not sure – I’m shooting from the hip here. Perhaps a simple markup could be hidden in HTML comment tags. Or perhaps a set of reserved keyword DIV and SPAN titles could be established (eg. <div id=”BlogMLtitle”>) as this would be an appropriate use of the ID element according the W3C specs.
Implementation can be dealt with. What I want to know first is: Does this make sense? What hasn’t it already been done? Had is already been done? What am I overlooking? I look forward to your feedback.