Paul Walsh

A real Semantic Web browser, enabling trust on the Web

 Posted on January 2, 2008 at 9:58 pm |  By Paul Walsh
 Leave a Comment, 7 Comments so far

I picked up on an interesting post about attention data from Damien Mulley’s blog. Whilst writing a comment on his post, I realised it was turning into an epic. It presented me with an opportunity to talk about Segala’s Semantic Web Firefox Trust extension too, so I’ve decided to write here and link to Damien’s blog instead of posting a comment on his.

The most applicable point for me in Damien’s post, was

It was in a past blog post here where I said that if we controlled our activity data, we could actually make money from search engines and the likes of Microsoft HealthVault, so there’s potential there. So I was quite interested when Mozilla announced Weave, their system which will store your Firefox preferences on their servers and when you install a new Firefox on a new computer, it can go to the Mozilla servers and download all your preferences and bookmarks.

According to Mozilla

Weave overview

The idea behind Weave is that all your personal information — bookmarks, passwords and account names, for example — are synced to your Mozilla account via Firefox. If you lose your computer, you can download Firefox, log into your account and you can restore all that information. You can do some of this today if you use Google Browser Sync and Dot Mac services. You can start by creating an account with Mozilla Services. You will need Firefox 3.0 or higher to get this working.

This is relevant to me as the functionality behind the Mozilla Weave has been available in Glaxstar’s Firefox browser for more than 2 years. When I say available, I’m referring to every single last detail. Whilst Glaxtstar’s Glubble browser is new, I’ve had insight to their technology for quiet some time.

Glaxstar is possibly the only development company in the world that could build a competitive Firefox browser to Mozilla in my opinion (Flock is a 1.0 effort compared to what these guys can do!). That’s if Ian decided to take that route. As it happens, he’s just interested in helping guardians to protect their loved ones from inappropriate content.

Note that I didn’t say, help to protect minors, or help governments protect people. That’s not his job. It’s not Google’s job, it’s not Segala’s job and it’s not the Government’s job either. Ian’s job is to help guardians who are responsible for deciding what’s appropriate and inappropriate for the people they’re responsible for. Technology should be perceived and used as an enabler, not a prohibiter. Furthermore, what a guardian in Germany deems approproate is not likely to be the same as what a guardian thinks in the UK for example. This is why I’d like people to perceive Content Labels as an enabler to help mainstream search engines and browsers to provide better content discovery, not a method for policing the Web.

So, I wouldn’t be surprised if Glaxstar gave the Weave code to Mozilla given that they’ve had it for more than a couple of years and they built Mozilla’s mainstream browser extensions for companies such as Google, Yahoo!, PayPal and eBay. They also maintain spreadfirefox.com and are responsible for resolving defects in the mainstream Firefox browser. That makes Glaxstar the most qualified company in the world to build Firefox add-ons in my opinion.

Luckily for me, Ian Howard, Founder of Glaxstar, is a personal friend of mine. So, who better to build Segala’s Firefox trust extension (not plug-in, that’s something different) Search Thresher. Our extension really is based on The Semantic Web, unlike the claims made by many of the co-called Semantic Web search engines.

Sorting the wheat from the chaff

As I’ve said, Glaxstar and Segala have been working together for the past couple of years. Although, we haven’t updated our extension in over a year (I guess that demonstrates how ahead of the curve we’ve been). As of February though, you should expect to see regular updates for our Trust extension.

Search Thresher is just one of the pieces in our jigsaw to help demonstrate why and how we feel very confident that 2008 is the year to tell Segala’s story. You will notice me talking less about conferences that I host and Chair and more about our Semantic Web method of classifying content.

What’s with the name?

The thrashing machine, or, in modern spelling, threshing machine (or simply thresher), was a machine first invented by Scottish mechanical engineer Andrew Meikle for use in agriculture. It was invented (c.1784) for the separation of grain from stalks and husks.

For thousands of years, grain was separated by hand with flails, and was very laborious and time consuming. Mechanization of this process took the drudgery out of farm labour.

Today, searching the Web is equally laborious. You may or may not find what you’re ’searching’ for and even when you do find what you want, can you trust what you find?

Think of Search Thresher as a threshing machine. It’s a Firefox extension used to demonstrate to search engines and mainstream browsers, how they can (and should!) provide users with more trust on the Web using a method called Content Labelling.

We haven’t touched the extension for over a year as we’ve been focused on other stuff that I’ll tell you about soon. If you’re a designer and would like to be recognized for your work, please feel free to volunteer your services to rebrand the Web site. Search Thresher is a non-profit standards based browser, so this may be of interest if you’re a standards enthusiasts.

We’re not emotionally attached to the name Search Thresher. What do you think of it? We’re open to suggestions if you can propose something better.

Read more about Content Labels - this post also includes sample use cases.

Watch a quick video about Content Labels

There are currently 7 Comments on this post
 Leave a Comment   Listen to this Listen to it   Print it Print it   Share it

7 Comments So far, Leave a Comment.

RSS Feed for comments TrackBack URI

  • January 3, 2008 @ 12:00 pm

    The real problem with having a semantic web browser and the problem that has dogged the semantic web in general is that almost no web sites currently attach semantic annotations to their content. And for publishers there is very little benefit in adding semantic annotations to their content since there are no real applications that use them.

    So if you’re going to use semantic annotations to filter content you’re limited to a teeny tiny pool of sites from which to take content.

    How can you overcome this problem?

    Coming from a machine learning/NLP background I think that the solution is build more intelligent web crawlers that can automatically recognize different types of content and generate the annotations automatically. This scales much better as it doesn’t depend on the presence of pre-existing annotations.

  • January 3, 2008 @ 1:03 pm

    @Aidan - you’re absolutely right. My answer to your question will be in more detail when we tell our story in full, rather than bite size chunks where we talk about each part of the puzzle out of context.

    Until then, we’re building a partner network of companies (similar to VeriSign) that design, build and test Web sites. We enable partners to audit and certify Web sites - each will be labelled using a Content Label (the Semantic stuff to which you refer).

    We have almost completed the build of an application that automatically generates Certificates and Content Labels - cutting the process time down from 4.5 hours to under 2 minutes. We intend to give this away for free to other Trustmark Providers to enable them to provide Content Labels.

    Use cases include
    - Child Protection - ICRA already label sites using Content Labels
    - Web Accessibility compliance
    - mobileOK compliance (new W3C standard)
    - Creative Commons (this think it’s a great idea)
    - Privacy
    - Copyright
    - Medical
    - iPhone-ready (apple is looking into this)
    - lots more

    So, as you can see, we’re building an entire ecosystem through which, we intend to demonstrate how it works using the extension.

    BTW, our extension is now endorsed by the W3C as one of four applications to demonstrate real implementations of the Semantic Web - as voted by people like Danny Ayers etc.

    Does that help to paint the picture? As you can see, 2008 is going to be a busy year. But we need people like you to believe in Content Labels and help by evangalising them :)

  • January 5, 2008 @ 9:22 am

    Coming from an agricultural background myself I really like the thinking behind the name, Search Thresher. It seems like a perfectly descriptive label and better still it ‘tells a story’. However…. I just can’t say it without stumbling over it. Just like I can’t say “sunshine and showers” quickly (I end up saying sHunshine and showers or sunshine and sowers). For me Search Thresher just doesn’t trip off the tongue, but perhaps that’s just me.

  • January 5, 2008 @ 11:11 am

    Me neither James. I originally decided not to put the word ‘Trust’ in the name because we were unable to get a domain to mate. However, I don’t think that’s necessary anymore. Unfortunately I can’t claim to have come up with Search Thresher, that was Ian Howard from Glaxstar.

  • January 5, 2008 @ 6:54 pm

    Hi Paul,

    I’d be interested in hearing how your application for automating the creation a content label works. E.g. Is it fully automated or a tool for making a human more efficient in creating the content label?

    I can see how content labels are useful for each of the use-cases that you describe. My understanding of them is that they are suitable for making statements about a website that are unambiguously verifiable. Stuff like “this site X conforms to accessibility guidelines Y”.

    This is fine for when you want to do search over a small well-defined subset of the web where sites in that part of the web have taken the time to add semantic annotations.

    But semantic browsing or searching is limited to sites that have semantic annotations and thats a tiny portion of the total web. And it’s likely to be a tiny portion of the web for some time to come. This is not really a problem of the technology - it’s a practical one of adoption. Anyway, I look forward to hearing more details of the technology and seeing applications of it over the next year.

  • January 5, 2008 @ 8:32 pm

    @Aidan - you click a few boxes and input a little text such as URL(s) and then click save. The application automatically creates the Content Label and places it on our server. It creates the visual certificate (if applicable for the assertion) and generates the relevant Trustmark. The user is immediately emailed the link tag to include in each template and a small piece of code should they wish to tweak the server instead of labeling each template. They also receive the code to display the Trustmark. Everything is automatically linked. The process of choosing the claims etc. takes just a few minutes. The time it takes to create the Content Label is almost instant.

    Yes, Content Labels are for unambiguously verifiable guidelines. It’s not for stuff like ‘My Web site is well designed’. This wouldn’t be useful or practical.

    Take your mind away from ‘Semantic Browsing’. We’re providing a solution to a specific problem - that is, enabling more ‘trust’ on the Web – we want to provide more information about the suitability of the content on Web sites and not just the title and description.

    We’ve been working on the ecosystem for a couple of years now have other boxes to tick to help demonstrate why we think mass adoption won’t be a huge problem.

    Content Labels is going through the W3C as a ratified standard. This is called POWDER for political reasons I won’t bore you with unless you specifically want to know. So, Content Labels will replace PICS – the first W3C recommendation, which is used by IE today for filtering content.

    So, I don’t expect much of the longtail to be labeled but they’re not likely to be the sites you’d need to trust as we see it.

    What will make all of this easier, is if we can persuade people like you that it’s a good thing so you help by evangalising – just like the Microformats guys did :)

    We need to redesign and launch http://contentlabel.org and we’re having the logo/icons for Content Labels designed currently.

    Keep the questions coming!

  • January 5, 2008 @ 8:36 pm

    I forgot to mention that we’re going to give our application away for free to any Trustmark Provider that currently exists - this will help bring current providers up to our level ;)

Leave a comment


We're constantly spammed by people who have as much life as the robots they use. So, we hope you don't mind if we moderate your comment if it's your first time on this blog.

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Live Preview of your comment-

 
Close
E-mail It