Here is another term project option:
Consider the CiteSeer Web site
It give graphs showing # citations
for each year of publication: e.g.
has the graph

showing that there are about 26 citations to papers published by Kreinovich in 1996 known to CiteSeer
You can surf from the above URL to find each citing paper and its year of publication
This project would mine and create a table or graph showing # citations by year the citing papers were published, instead of year the cited paper was published.
Some Notes on Chapter 10, User Interfaces and Visualization
A strategic consideration:
It was suggested you skip around
When there is too much information,
- Browse! (same idea as you use with Web "browse"rs
If you just read straight through
you wouldn't get far, and
what you read would be
- determined by chance, more or less
- ...instead of by your interests
How many skipped around? How?
Question: did the outline paragraph (bottom of p. 257) affect you?
Is it worth reading in this case?
In general?
(carefully? skimming? extractingly? when?)
Beginning the Information Access Process
Examples:
Invoking a search engine
Going to the library (remember them?)
Suppose
you do this with a goal in mind
How clear is your understanding, typically, of:
how to achieve the information access goal?
(complete?
mostly clear?
only fuzzy?
mostly uncertain?
not even a clue?)
Hearst says often ____________
fuzzy
Hence the need for a user interface that
- Helps you understand and state your needs
(These are different things)
Understand - know what you need
State - say what you need
So a
query-based system interface should
Help in composing the query
- What are shortcomings of
some search engines in this?
| Loading........... |
Make sure the query is
understandable
- Could a query not be
understandable?
Consider systems for retrieving information
Should their interfaces
Help users to specify their queries?
Help users decide what source(s)
to search?
Help users understand the return list?
Help users understand a given
returned item?
Help users keep track of what
they've done?
How would you rate a given search
engines on these things?
How might future technology be better
at some of these?
Evaluation issues (section 10.2.3)
1. Traditional evaluation of information
retrieval:
Emphasis on recall and precision
Traditional systems are non-interactive
What about interactive retrieval?
It is the modern approach
It requires a better user interface than
previously
Does the interactivity change the
importance of:
Recall?
Precision?
How?
2. User studies ("Formal psychological
studies")
What problem did Byrd run into?
Other Problems:
Showing objective improvement
Issues like learning curve muddy
the waters
Any others?
"Formal psychological studies usually only uncover narrow conclusions within restricted contexts" (p. 262)
"Models of Interaction" (sec. 10.3.1)
Traditional model:
1. make query;
2. get list of all matching documents
How does this fall short?
Some other models:
Interactive situations need rankings
Getting *all* is not so important in
interactive settings
What if user doesn't know the right query?
What if user doesn't know Boolean logic?
What if user doesn't know what is wanted
"I'll know it when I see it"
"Berry-picking" is another model
"Information foraging" is similar
Example:

Figure 1. Sample MultiBrowser screen from a repository created from documents obtained by the Web search engine query “powered parachuting” (no endorsement of any company is intended). KEY: the 4 numbers associated with each FIND SIMILAR link state, for its associated paragraph, how many of the six windows it points to will contain documents with, respectively, many incoming paragraph links, a modest number (n), few (f), and a composite “%” ranking equal to 100-8n-16f.
These models suggest
Non-query-driven retrieval
E.g. "Find Similar" links
Query reformulation support
Consider 10.4, "Starting Points"
"users
tend to start...with...short queries, inspect the results...then modify those
queries"
Do you agree?
"...search engines...plunge the user into the middle of a...site...with little
information about the relationship...to the...collection"
What is the "collection" here?
Is this bad?
Why?
Automated Source Selection
Consider this question:
"What is the best search engine?"
Source Selection
To pick sources well, use e.g.
user models
artificial intelligence, dialogues, etc...
Alternative: pick source based on query
Example: query on medical stuff
suggests Medline
Example: syntax of query suggests
search engine that understands
that query
Example:
biological taxonomy-aware
MEDLINE access (PathBinder)
see www.plantgenomics.iastate.edu/PathBinderH