The "Semantic Web"
The "next big thing" in information retrieval?
We've seen boolean queries
They are non-fuzzy and don't rank
We've improved that with vector queries
They make partial matches, hence can rank
We've improved that with link analysis
They greatly improve ranking with social filtering
We still have not detected some important matches:
If query was "phone number"...
...then 294-3959 would not be a match...
...but it should be (perhaps not top ranked)
If query was "pets"...
...then "dogs" should be a match
If query was "dogs"...
...should "pets" be a match?
If query was "colds"...
...then "respiratory illnessses" should be a match
If query was vector <587, syllabus>
Should this course's home page be a match?
Should pages about Flight 587 be a match?
What about the Cottonwood Hills Public Golf Course, 406-587-1118?
Semantic
Semantic - "of or pertaining to meaning..." (Webster's New World Dictionary, 2nd College Ed.)
The theme to the above match issues is meaning
We query with words
But we are (almost always?) not interested in words
We are interested in meanings!
This is in part why vivisimo.com works so nicely
Often, a word's neighbors determine its meaning
Therefore, it would be good if the Web...
...used meanings instead of words
Because the Web is hypertext, words will not go away
But, we can try to annotate with meanings
The better we do, the better the Web will work
If we do well, we can call the result
"The Semantic Web"
Creating The Semantic Web with Ontologies
An ontology is
a controlled vocabulary, and
clear, simple relationships among its terms
Dictionaries give relationships among terms
...but they are not clear and simple!
Let's consider some example ontologies
Ontology Example:
"person" isa "primate" isa "mammal" isa "vertebrate" isa "animal" isa "organism"
"dog" isa "canine" isa "mammal" isa ...
(add many, many more...)
Now, queries can work better
Questions can also be asked:
Is a person a mammal?
Ontology Example:
(a) Named concepts
"pot" - concave, rigid, thin-walled object
"heat source" - increases temperature of something
"person" -
"food" -
"stuff" -
"eat" -
What else in this domain?
(b) Relationships among concepts
"pot" - holds "stuff"
"heat source" - increases temperature of pot
"person" - deletes "food"
"food" - "stuff" that a "person" "eats"
"stuff" - "food" is-a "stuff"
"eat" - process of a "person" deleting "food"
What else?
We might, for example, represent things as nodes, relationships as edges
(let's try it)
(c) Inference rules
If person p eats food f, there is less f
If pot t gets hot, stuff s in pot gets hot too
What else?
Now, some types of queries can be done better
E.g., questions can be answered
Queries about cooking can return pages about
pots
food
...because of the semantic relatednesses that are known to the system
Recent Progress on the Semantic Web
About the World Wide Web Consortium (www.w3.org)
W3C promotes development of ideas and standards related to the Web and its future
History: "In October 1994, Tim Berners-Lee, inventor of the Web, founded the World Wide Web Consortium (W3C)" - http://www.w3.org/Consortium/
"W3C's Goals
W3C's long term goals for the Web are:
"Definition: The Semantic Web is the representation of data on the World Wide Web. It is a collaborative effort led by W3C with participation from a large number of researchers and industrial partners. It is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming." - http://www.w3.org/2001/sw/
What W3C has been doing wrt the semantic web:
Data:
<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> "SPARQL Tutorial" .
Query:
SELECT ?title
WHERE ( <http://example.org/book/book1> <http://purl.org/dc/elements/1.1/title> ?title )
Query Result:
| title |
|---|
| "SPARQL Tutorial" |
RDF
RDF=Resource Description Framework
A language for describing metadata
Meta - "about"
Examples of Web metadata: content ratings, privacy specs, etc.
Graph-based (like the Web is graph-based!)
OWL
OWL=Web Ontology Language (er, yes)
OWL allows describing ontologies such as those mentioned earlier
Some notes on: J. Hendler, Agents and the Semantic Web, IEEE Intelligent Systems, March/April 2001, pp. 30-37.
Existing Ontologies Available for Use
Summaries
Queries
Statistics
Send updates/corrections to ontology-librarian@daml.org. Submit additions here.
This catalog of DAML ontologies can also be viewed in XML and DAML formats.
Getting Ontologies to be Used
Two ways to put (a) named concepts
throughout the Webi. A universal set of concepts that everyone uses
ii. Everyone makes up their own concepts
How is a concept to be named? XML is one way
Which is more realistic, i or ii? xml for ii, i is already present as words
Why would people intentionally name concepts?
It's more trouble than not doing it (if using XML)
But...it enhances the value of Web information
Value enhancement is a motivation
Problem: value is enhanced only if
apps exist that use the concepts
Why specify the concepts
if
important apps don't exist?
Problem: why write apps if concepts
are not specified?
This is a "chicken-and-egg" problem (Hendler)
To solve, DARPA is funding research
So DARPA is trying to be the chicken (or egg)
Solution 1: DARPA
Solution 2: words instead of XML
Making concept specification easier
Tools that output HTML should do it automatically
The author doesn't even know it
Example:
You put a phone number in a document
(such as 294-3959)
The editor I used should automatically
- recognize it as a phone number
- tag it as such, e.g.
<phonenumber>294-3959</phonenumber>
- I should not have to even know
Example:
The editor automatically
classifies the
document in a given category, and
adds
- a subject tag to the document
- the categories are determined by the editor's ontology
- the classification is
done by comparing its
keyword vector to the
category's vector
- ...thereby connecting
semantic categories to
words
Example:
You add an image of a computer to a doc
The image is chosen from a list, using a menu
The editor automatically labels it with a
<computer> tag
Actually this can be
done for
the word (not XML) approach
very easily:
- Just name the image e.g. computer5.gif
(not e.g. image0091.gif)
- Current image search engines
already are based on file names
Conclusion
Should/will future intelligent processing of the Web use ontologies or words?
Is an intermediate approach possible?
Should/do vivisimo.com and northernlight.com use ontologies or words?
Is an intermediate solution possible?
Drill-down directories (e.g. Yahoo, PathBinder) define their own ontologies
(How?)