Johnny's Software Saloon

Weblog where I discuss things that really interest me. Things like Java software development, Ruby, Ruby on Rails, Macintosh software, Cocoa, Eclipse IDE, OOP, content management, XML technologies, CSS and XSLT document styling, artificial intelligence, standard document formats, and cool non-computing technologies.

My Photo
Name:
Location: Germantown, Maryland, United States

I like writing software, listening to music (mostly country and rock but a little of everything), walking around outside, reading (when I have the time), relaxing in front of my TV watching my TiVo, playing with my cat, and riding around in my hybrid gas/electric car.

Sunday, April 02, 2006

SchemaWeb - Browse

The Social Web, Semantic Web, and World Wide Web have all advanced greatly in the past two years.

The technical standards and associated tool under-pinning them did not really spring up along with them. Those things were already there from previous years, or in some cases, the previous decade.

What did seem to appear alongside them as some web programming frameworks, in a few cases, and a heightened interest in scripting languages as opposed to standard IT/MIS production languages.

One of the biggest contributors to the meteoric rise in the social web in the past several years has been RSS. The RSS document standard is a way of basically recording and interpreting What's New.

The RSS standard is about a decade old at this point. It just slightly predates the introduction of Netscape's Channels, which were based on RSS.


Anyway, people started using RSS to list what was new on their News sites, the weblogs, their corporate sites, their cartoon sites, their photo sites, and their social activity sites. Once that happened, they all began to feed on each other. Yes, pardon the pun.

The interdependency and interconnectivity of so many websites' content greatly increased the ability of people to see what is going on. That really changed the web.

Beyond what RSS provides - a list of items related to a topic in something called a channel - there has not been a lot of progress in moving information from one website to another.


Screen-scraping is an old and fragile, ad hoc practice from the 1970s. It was invented to overcome the limitations of computers that could not conceive of sharing information with another computer. They would readily share information with a forms-based computer terminal with a human on the other end of it, though.

So screen-scraping arose as a pragmatic trick for getting one hopelessly poorly-architected computer to speak to another by spoofing it into believing it was talking to a terminal.

It worked. The problem was, if anything having to do with the choice of what fields were on the form, where they were laid on it, or what format they were presented was changed, then the whole thing would break.

That is exactly what happens when screen scraping of web sites is employed today. In a sense, it is actually worse. The typical HTML web page document is riddled with errors, making it not a valid HTML document at all.


This was actually recognized as a problem a decade ago. So the Semantic Web was devised.


The Social Web is a slightly different creature. At first glance, it is not concerned with whether different computer programs can interpret what is going on with islands of information stores scattered all over the place. It only wants to see how people are connected, help them share information with each other, and let them observe what is going on.

A typical Social Web site will also let users collaborate on something. Examples include:
  • Amazon users helping each other determine which are the best books, records, and movies
  • All Consuming users doing the same thing, with foods, as well as those items
  • Last.FM users sharing information about which songs are the best - be they up-and-coming pop bands, old classics from the 60s, or some obscure indie band that made a CD a couple of years ago that is starting to catch a wave of interest (perhaps thanks in no small part to Last.FM and its online community)
  • TV.com users identifying their favorite TV shows, the best (or worst) episodes, and sharing the latest news on favorite TV actors - plus building excitement about upcoming shows
  • Yahoo Movies users helping each other decide which are the best - and worst movies out there, so people can gauge what movies to rent - and whic ones to see (or miss) that are playing now at the theaters



Contrast that with old-fashioed HTML web pages. They come in three varieties.
  1. word processor documents
  2. interactive forms
  3. human-readable lists or tables of text


Web pages were too stupid to pass information about what was going on to each other. That is why the semantic web was created. Formats like RSS are one type of semantic solution. RSS mainly communicates about documents though.

So a number of sites can read the latest bits of information from one site using screen-scraping or, now, RSS. However, reusable software with any deep understanding of what is going on at all of these sites is still not here.


That is where the rest of the semantic web comes in. The SchemaWeb website catalogs all of the ontologies out there that are part of the semantic web. There are a lot of them.

Each ontology represents the definition of a different portable data model. These data models define the formats for self-contained descriptions of related things which can then be put in a document and stored or passed around.

The more sites that use these ontologies, the more sites will be able to transfer not only lists of words or hyperlinks - but knowledge of things about which they all share a common understanding. They will be able to use the same software to exchange, interpret, format, sort, filter, and edit that information.

Since the format is a standard, they can pop out one software module to do some of these things - and pop in another. The software will be able to evolve quickly. That is because he data formats and the software are not not tightly coupled to each other.


It sounds like there would be a need for a general purpose program that can read in one of these wonderful ontology files out there. It would be amazing if it then let a user immediately start banging in - and executing searches and reports - without writing any reports or even talking to a programmer.

That tool exists and I have been using it for about 4 or 5 years. Its name of Protege and it is free. It is written in Java, so it will run on any desktop computer (Windows, Linux, Macintosh, Sun, IBM, etc.). In fact, it can also run on a web server - in which case the forms makes are web pages.

Protege takes care of the drudge work of creating forms for: add, edit, delete, link, unlink, search, and query. That lets you create on the most important thing: information. You do not have to waste a lot of time writing procedural code. The whole thing works declaratively, driven by the .owl files your get from the SchemaWeb site - as well as the very smart software built into Protege.

Protege is a relevant topic now for another reason. They are up for getting their funding renewed, as it must be periodically. So they are asking everyone who is using it to write up a little bit of information about what they are doing with it. That will help their patrons determine that this is indeed a beneficial software program that should continue to improve.

It is, so try it out quickly - and tell them. Protege has a turnkey installer and is easy to figure out. It is almost magical how it meets your needs without you having to explain them. You just feed it an ontology and go to work.

Oh, and it helps you create your own ontologies very quickly. If you have ever done any object-oriented programming, you will take to it like a fish to water. If you haven't, but say you are familiar with the biology approach to classifying things using an organized system (kingdom.... class.... species) it is sort of like that, but more flexible and less memorization required.

It is Sunday afternoon now. So go grab Protege, try a couple of the samples out, and then do one more thing. Grab and interesting-looking ontology, preferably in OWL format, from the SchemaWeb site - and try it out in Protege.

Cool, huh?!!


There is a lot more under the hood in Protege. You don't even see it normally, but there is a semantic web server engine humming under its hood. You can use super powerful technologies like Jena with it.

Jena includes support for SPARQL in it that is worth a good look at. SPARQL can look at OWL and RDF-S files, and then let you write very simple looking queries in a sort of SQL-like syntax. Eliminates the need for you to write a lot of complicated software to do commonplace things with each new data format that comes out.

That is what the semantic web is all about. Stop reinventing the wheel. Just grab or make an ontology, and start using it to do all the things you normally do. Just without a long delay up front to write a lot of programming.


Who knows? Maybe you will grab Protege and a couple existing ontologies, and write your own Social Web site or website tool. Then you could be the next millionaire. Aren't you glad you did not have to write a lot of software to do it?

0 Comments:

Post a Comment

<< Home

Related pages & news