Writing semantic markup: Robots to the rescue!

Some very smart people think that the next big leap in web technology will be on the foundation of the Semantic Web. However, some other very smart people are raising concerns that this semantic utopia may be unattainable.

Matthew Thomas is an interface designer from New Zealand. Yesterday on his website, he posted a summary of a few of these smart people’s concerns about the move towards semantic markup on the web. The biggest problem is that people just don’t care about the semantic web. It takes an essay just to explain what the semantic web is – but that doesn’t mean it’s not a worthwhile idea.

I’m sympathetic to Thomas’ points here. I’ve been working to move a web-based system to the XHTML standard. On top of the usual CSS struggles (my mind still thinks in [table] tags, but I’m slowly learning to love CSS), I’m running into a difficult problem. On this particular web system (and on many, if not most, web systems), the users generate most of the content.

First of all, the web is a crappy medium for writing. It’s good for publishing what you write, but it is terrible at the actual writing stage. Spell checking, periodic backup, saving drafts, etc. – all features we’ve grown accustomed to in word processing – are sitting there, in the next window, just a few pixels away from our arcane DOS-esque text-only [textarea] form input box. Lame.

First, we need the browser makers to put better text-editing tools at our disposal. However, here’s where it gets a little complicated. You’ve probably heard hot-shot web developers scoffing at WYSIWYG web-editors before. This is mostly because they product messy and convoluted code. There is, a deeper problem though. The web is not a WYSIWYG medium. The whole idea of XHTML and CSS technologies are that you can separate design from content – style from meaning. WYSI-not-WYG.

A simple (inane) example: I recently posted a reply to a post on the Signal vs. Noise weblog. I included a quote in my reply. I used the [blockquote] tag to indicate which part of my reply was a quote. When I submitted the post, I was pleasantly surprised to see that our friends at Signal vs. Noise had included some nice formatting for the blockquote tag in their stylesheet. As a result, my quote was nicely formatted to fit in their style and layout.

There is a powerful idea behind this simple example. When I used the [blockquote] tag, I wasn’t ‘formatting’ my post. I was adding meaning to the text – I was using machine-readable language to tell web browsers that the next few words are a quote. I didn’t know exactly what it was going to look like. (Note: there are better ways to cite a quote, but this example makes the point)

I’m not sure we can expect everyone to make this distinction. I do think, however, that people can produce writing with semantic markup if the software does the hard work.

We need a semantic-friendly-WYIWYG text editor for the web. Here are some proposed features:

  • Hide the code from the writer (but make it accessible to those who want it – as many current editors do).
  • Provide only semantic tools: lists, blockquotes, citations, links, emphasis, strong, etc.
  • Not quite WYSIWYG: show the text in real time in a typically styled format – perhaps even adopting the style of the destination website.
  • Automate the creation of meaningful markup. For example, when a link is created, prompt the author for a descriptive link title.

By the way, someone has come up with an apt name for what I’m doing here. It’s called the LazyWeb – when smart-asses like me rant and rave, but don’t do anything about it. The hope is that through the LazyWeb, people willing to write code and implement can meet up with the idea (read: lazy) people.


13 thoughts on “Writing semantic markup: Robots to the rescue!

  1. Interesting ideas, however if your proposed editor hides the semantic “code”, then the user will see formatting instead of semantics. If they think they are formatting then they will likely expect WYSIWYG and not be happy with non-WYSIWYG results. Is it possible to have a semantic editor without exposing the user to some form of identifiers representing the semantics?

  2. You’re absolutely right Nathan. I’m not sure how to best deal with that. I few thoughts I’ve had would be to show the output in a variety of different stylesheets (simultaneously, or sequentially, maybe) or to use a common base set of standards (everyone knows about underlined blue links – we could expand conventions like that). The idea of actually showing the some kind of identifiers might be the only realistic way to do it.

  3. After thinking about Nathan’s comment a bit more – I think he’s really hit the nail on the head. As Nathan put it:

    Is it possible to have a semantic editor without exposing the user to some form of identifiers representing the semantics?

    This is the key question. I’m not sure.

  4. I think actually it’s quite easy. It is the Lyx editor’s approach. People is accostumed to have an automatically generated Table Of Contents from their Chapter and Section headers, or have “book” or “article” formats from a “Document template”.

    The key factor (which Lyx uses perhaps too radically) to migrate them to semantic web would be to hide formatting tools and force them to compose the document in a semantic way, and then give them a good post-processing tool to transform the semantic document to a final version. This way they would easily get used to think of formatting as a separate step.

  5. 18 months ago I developed an XHTML editor that’s pretty much what’s been described for my day job employers. We looked at various editors for our CMS systems but they all seemed happy to allow users to inflict green Comic Sans on the world if they saw fit. Like many editors it simply uses Win IE’s contentEditable feature, but with only basic semantic markup available and filtering for rubbish inserted by IE or pasted in.

    The next version is likely to feature subtle box outlines/fills to help make structure clearer (e.g. to show that 5 lists each with 1 item is different from 1 list with 5 items). I reckon most aspects of a simple XHTML document can be conveyed OK, it’s only if you want to cover nested divs, complex tables and forms that it gets difficult. Most users require some level of help/training initially, as it’s unusual for them to be forced to think in terms of structure when writing, but only a minority seem to fail to grasp the basic concepts at all.

  6. I agree with this trin of though. Pershap the HTML textarea should fire a browser plugin allowing the user to use whatever local text editing and semantic markup facility he has configured. This is immanently extensible at the client end, even to the point of looking up phrases in the text in a database that associated them to a URL.

  7. i don’t agree with “the web is a crappy medium for writing”. once u have ur layout planned u can be semantic. its those little things that coung. don’t put up text w/o para n stuff.

  8. And don’t forget semantic is not about [cite][/cite] tags only… it’s about more than that… including lang=”…”, q, and a lot of markupt who people doesn’t care about…

    I write semantically, including urls, adding hreflang=”…” and xml:lang=”” for every element i like….

    Making a WYSIWYG web-editor for posting and semantically loaded it should be really hard. really. more if you can’t see the tags your’s using…

    lot more….

  9. This thread proves Steve’s point about the web being “a crappy medium for writing”.

    mini-d: “semantically loaded it should be really hard. really. more if you can’t see the tags your’s using”. What? I’m not sure having a better editor would help this statement, but it would at least pick up the blatant problems like your’s.

    Seth: “I agree with this trin of though“.

    Don’t be confused and think that Steve means that it’s a crappy medium for displaying writing. Steve says himself “it’s good for publishing what you write”. There are some decent web-based editors out there, but they are really hacks of java-script and dhtml or plug-ins or Java applets.

  10. To mini-d (from a Diego to a Diego): most tags should be automatically added by the editor from templates and default configuration values, and formatting should be done in a separate tool (for example a visual CSS editor).

    As I envision it, when the next release Mozilla 1.# came with a content aggregator and semantic agents, and Composer has semantic publishing tags, a few bloggers will start the snowball by semantically decorating their posts. The rest of readers will notice these additional feature and will want to copy it.

    Or so I hope it’ll happen… 8)

  11. I’m not sure that the [blockquote] kind of tags have anything to do with semantics.

    Semantics is about meanings, and meaning requires dictionaries, or what the AI community calls ontologies. The [blockquote] tag doesn’t refer to a concept, it is just a syntaxic annotation of text.
    Another kind of syntaxic annotation of text is the XML based separation between data and presentation. XML annotation is not semantic annotation, because it needs a human brain to understand the meaning of the tags… A dumb computer program will not make any relation between a [blockquote] tag and the quote concept. Your tag may have been [bq] or [b_quote] or [whatever], thats makes no difference.

    The semantic web is about the idea of defining concepts and building relations between them (ontologies), then annotating web pages with those concepts. Then, our computer program (while still dumb) will be able to “understand” that the tag [name] is related to [person] whish is related to [human] whish is related to [animal] whish is related to [living_creature] … When we add some rules (eg. RuleML) like :
    if a person’s name is john then he is a man
    the computer program (let’s call it an agent, since it is beginning to be less dumb) may be able to associate the person’s name with his sex.

    This is whats it is all about.

    The semantic web technology is still in the labs, some effort is done to create standards, demos and software.

    Ontology standards : RDF, DAML+OIL, W3C OWL

    Semantic annotation tools : http://annotation.semanticweb.org/tools

    Another area of interest is the semantic web services … but this is another story.

Comments are closed.