The "Semantic Web"

The "next big thing" in information retrieval?

We've seen boolean queries

They are non-fuzzy and don't rank

We've improved that with vector queries

They make partial matches, hence can rank

We've improved that with link analysis

They greatly improve ranking with social filtering

 We still have not detected some important matches:

 If query was "phone number"...

 ...then 294-3959 would not be a match...

 ...but it should be (perhaps not top ranked)

 If query was "pets"...

 ...then "dogs" should be a match

 If query was "dogs"...

 ...should "pets" be a match?

 If query was "colds"...

 ...then "respiratory illnessses" should be a match

 

If query was vector <587, syllabus>

   Should this course's home page be a match?

   Should pages about Flight 587 be a match?

   What about the Cottonwood Hills Public Golf Course, 406-587-1118?

 

Semantic

Semantic - "of or pertaining to meaning..." (Webster's New World Dictionary, 2nd College Ed.)

The theme to the above match issues is meaning

We query with words

But we are (almost always?) not interested in words

We are interested in meanings! 

This is in part why vivisimo.com works so nicely

Often, a word's neighbors determine its meaning

Therefore, it would be good if the Web...

...used meanings instead of words

Because the Web is hypertext, words will not go away

But, we can try to annotate with meanings

The better we do, the better the Web will work

If we do well, we can call the result

"The Semantic Web"

 

Creating The Semantic Web with Ontologies

An ontology is

a controlled vocabulary, and

clear, simple relationships among its terms

Dictionaries give relationships among terms

...but they are not clear and simple!

Let's consider some example ontologies

 

Ontology Example:

"person" isa "primate" isa "mammal" isa "vertebrate" isa "animal" isa "organism"

"dog" isa "canine" isa "mammal" isa ...

(add many, many more...)

Now, queries can work better

Questions can also be asked:

Is a person a mammal?

 

Ontology Example:

(a) Named concepts

"pot" - concave, rigid, thin-walled object

"heat source" - increases temperature of something

"person" -

"food" -

"stuff" -

"eat" -

What else in this domain?

 

(b) Relationships among concepts

"pot" - holds "stuff"

"heat source" - increases temperature of pot

"person" - deletes "food"

"food" - "stuff" that a "person" "eats"

"stuff" - "food" is-a "stuff"

"eat" - process of a "person" deleting "food"

What else?

We might, for example, represent things as nodes, relationships as edges

(let's try it)

 

(c) Inference rules

If person p eats food f, there is less f

If pot t gets hot, stuff s in pot gets hot too

What else?

Now, some types of queries can be done better

E.g., questions can be answered

Queries about cooking can return pages about

pots

food

...because of the semantic relatednesses that are known to the system

 

Recent Progress on the Semantic Web

About the World Wide Web Consortium (www.w3.org)

W3C promotes development of ideas and standards related to the Web and its future

History: "In October 1994, Tim Berners-Lee, inventor of the Web, founded the World Wide Web Consortium (W3C)" -  http://www.w3.org/Consortium/

 

"W3C's Goals

W3C's long term goals for the Web are:

  1. Universal Access: To make the Web accessible to all by promoting technologies that take into account the vast differences in culture, languages, education, ability, material resources, access devices, and physical limitations of users on all continents;
  2. Semantic Web: To develop a software environment that permits each user to make the best use of the resources available on the Web;
  3. Web of Trust: To guide the Web's development with careful consideration for the novel legal, commercial, and social issues raised by this technology." - from http://www.w3.org/Consortium/, 2004 (emphasis mine)

"Definition: The Semantic Web is the representation of data on the World Wide Web. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming." - http://www.w3.org/2001/sw/  

What W3C has been doing wrt the semantic web:

"News and Events

 

RDF

RDF=Resource Description Framework

A language for describing metadata

Meta - "about"

Examples of Web metadata: content ratings, privacy specs, etc.

Graph-based  (like the Web is graph-based!)

 

OWL

OWL=Web Ontology Language   (er, yes)

OWL allows describing ontologies such as those mentioned earlier

 

 

Some notes on: J. Hendler, Agents and the Semantic Web, IEEE Intelligent Systems, March/April 2001, pp. 30-37.

 

Existing Ontologies Available for Use

http://www.daml.org/ontologies/ .....

From that site:

"DAML Ontology Library

Summaries

Queries

Statistics

Send updates/corrections to ontology-librarian@daml.org. Submit additions here.

This catalog of DAML ontologies can also be viewed in XML and DAML formats.

 


Fri Apr 30 01:44:56 EDT 2004"

 

 

Getting Ontologies to be Used

Two ways to put (a) named concepts
throughout the Web

i. A universal set of concepts that everyone uses

ii. Everyone makes up their own concepts

How is a concept to be named? XML is one way

Which is more realistic, i or ii? xml for ii, i is already present as words

 

 

Why would people intentionally name concepts?

    It's more trouble than not doing it (if using XML)

    But...it enhances the value of Web information

        Value enhancement is a motivation

    Problem: value is enhanced only if
    apps exist that use the concepts  

        Why specify the concepts if
        important apps don't exist?

    Problem: why write apps if concepts
    are not specified?

    This is a "chicken-and-egg" problem (Hendler)

        To solve, DARPA is funding research

        So DARPA is trying to be the chicken (or egg)

    Solution 1: DARPA

    Solution 2: words instead of XML

 

Making concept specification easier

    Tools that output HTML should do it automatically

    The author doesn't even know it

    Example:

        You put a phone number in a document

                (such as 294-3959)

        The editor I used should automatically

                - recognize it as a phone number

                - tag it as such, e.g.

        <phonenumber>294-3959</phonenumber>

                - I should not have to even know

    Example:

        The editor automatically classifies the
        document in a given category, and adds
        - a subject tag to the document

        - the categories are determined by the editor's ontology

        - the classification is done by comparing its
          keyword vector to the category's vector

        - ...thereby connecting
          semantic categories to words

    Example:

        You add an image of a computer to a doc

        The image is chosen from a list, using a menu

        The editor automatically labels it with a

            <computer> tag

         Actually this can be done for
         the word (not XML) approach very easily:

             - Just name the image e.g. computer5.gif

         (not e.g. image0091.gif)

             - Current image search engines
                already are based on file names

 

Conclusion

Should/will future intelligent processing of the Web use ontologies or words?

Is an intermediate approach possible?

Should/do vivisimo.com and northernlight.com use ontologies or words?

Is an intermediate solution possible?

Drill-down directories (e.g. Yahoo, PathBinder) define their own ontologies

       (How?)