Announcements

Monday, meet in Coover 1318 to observe and help assess demos

 

Recall:

 

 

 

   Models for Reader Interaction Systems   


Daniel Berleant

Iowa State University

Ames, Iowa 50011

berleant@iastate.edu

 

 

 

 

 

 

 

 

Some Relevant Previous Works


  • J.A. Waterworth and M.H. Chignell
      A model for information exploration
         Hypermedia 3, 1 (1991), 35-58
  • D. Ellis
      A behavioural approach to information retrieval system design
         Journal of Documentation, 45 (1989), 171-212
  • B. Shneiderman, D. Byrd, W.B. Croft
      Sorting out searching: A user-interface framework for text searches 
         Commun. ACM 41, 4 (April 1998), 95-98
  • I found these particularly useful
       . . . let me know if you have another favorite

 

 

 

 

 

 

 

 

Outline from Here


  • The Models 
  • A Framework for Organizing Models of Reader Interaction Systems 
  • Hybrid Systems Based on Two or More Models
  • Conclusion

 

 

 Ten Models  


  • The Hypermedia model
  • The Book model
  • The Newspaper model
  • The Citation model
  • The Reuse model
  • The Directory model
  • The Query model
  • The Text Mining model
  • The Composition model     
  • The Index model

 

 

 

   Hybrid Systems    


  • A hybrid system uses two or more models
    • we focus on models from separate leaves 

·         There are numerous potential hybrid systems

  # Models in hybrid  

  # Systems  

2

39

3

74

4

68

5

24

Total

205

  • Some already exist
  • Others are novel to a greater or lesser degree
  • (The paper lists 39)

 

 

 

 

 

 

 

 

 

 

 Some Hybrid Systems


  1. Book and Hypermedia: long HTML documents could benefit from page numbers and a clickable automatically generated index at the end
  2. Book and Query: divide long on-line documents into page-sized files and index with a search engine. Use Web browser and access a search engine to browse it. See http://class.ee.iastate.edu/berleant/home/Research/DWD/ WebBookPager/index.htm
  3. Newspaper, Query, and Index: automatically query a search engine nightly. URLs of any documents that are changed, new, or deleted since the previous night are emailed to the subscriber
  4. Citation and Hypermedia: references at the end of a scholarly paper could profitably state where in the paper they were cited. Example: see this paper in the CIKM proceedings
  5. Directory and Citation: Science Citation Index and Social Science Citation Index are traditional examples; ResearchIndex is on-line, free, and even presented at this CIKM 

 

 

 

 

 

 

REFERENCES

[1]  Benest, I.D. A hypertext system with controlled 
       hype. In McAleese, R. and Green, C. (eds.). 
       Hypertext: State of the Art. Intellect Books, 
       Oxford, 1990.                                     (Cited in Sec. 3.2)
[2]  Berghel, H., Berleant, D., Foy, T., and  McGuire, M. 
       Cyberbrowsing: Information customization on the Web. 
       Journal of the American Society for Information 
       Science (JASIS), 50, 10 (May 1999), 505-511. 
                                        (Cited in Sec’s. 2, 3.1, & 4 item 6)
[3]  Bollacker, K., Lawrence, S., and Giles, C.L. 
       CiteSeer: An autonomous Web agent for automatic 
       retrieval and identification of interesting 
       publications. In Proceedings of the Second 
       International ACM Conference on Autonomous Agents, 
       ACM Press, 1998.                    (Cited in Sec. 4 item 32)
       

 

 

 

 

 

 

 

Today

 

Finish parsing

 

Parsing

Parsing means checking if a string is

      grammatical (=matches the pattern)

Does not address meaning of the symbols

Example 1:
A compiler has a parser (“front end”)

…and…
a code generator that assigns “meaning” to strings in the language (“back end”)

Example 2:
B*C is a string…

matching the “arithmetic expression” pattern

…or matching the regular expression pattern   

But its meaning is undefined

It could be a product

It could be a repetition and concatenation

Nondeterministic FSMs can parse regular expressions

…but they can’t parse context free grammars

…so what can we do?

 

How to parse context free grammars

Several methods exist

From simple to complex

From slow to fast

From slow and simple, to fast and complex

      Why? The “tradeoff” concept

(So why no slow and complex methods?)

(Why no fast and simple methods?)

Anyway…

Simple and slow is often best

Top down parsing is simplest

Bottom up parsing is less simple

LALR parsing is even less simple

LALR
= Look Ahead Left Recursive”


Example Web site

CHAPTER 19

LALR(1) Grammar


This chapter presents a grammar for Java. The grammar has been mechanically checked to insure that it is LALR(1).

The grammar for Java presented piecemeal in the preceding chapters is much better for exposition, but it cannot be parsed left-to-right with one token of lookahead because of certain syntactic peculiarities, some of them inherited from C and C++. These problems and the solutions adopted for the LALR(1) grammar are presented below, followed by the grammar itself.

 


Notes:

LALR(1) is a subtype of LALR

The parsing area has many complexities…

…so take a compiler course!

 

Example Web site #2:

http://www.eng.auburn.edu/users/wenchen/course/6210/week10/1.gif

As mentioned, there are many complexities

 

Example Web site #3:

http://www.parsifalsoft.com/

----------------------

PARSIFAL SOFTWARE

Trial Copy
Glossary
Example
Freeware
Introduction
Contact Parsifal

 AnaGram
 LALR(1) Parser Generator
Now shipping: AnaGram 2.01 for Windows


 

 

Download Trial Copy for Windows

----------------------

 *new Creating thread-safe parsers with AnaGram

 *new More support for C++ parsers

The AnaGram Parser Generator

Here's how it works: You create a set of rules, a grammar, that describes the input to your computer program. You can try the grammar right away interactively, without writing code, using AnaGram's to File Trace pageFile Trace and to Grammar Trace pageGrammar Trace. Then attach C or C++ code to the rules, and tell AnaGram to build a parser for you.

The parser is a C/C++ function that parses text according to the rules in your grammar and, as it matches rules, calls your code to deal with them. Using a grammar means you get faster development, easier modification and maintainability, and fewer bugs in your software. You use an easy-to-understand description of input instead of bug-prone, fragile, branching code. You make the rules, and AnaGram makes a parser for your input.

AnaGram Makes It Easy!

AnaGram is explicitly designed to be easy to use, with a powerful notation for representing grammars, so you can write grammars fast.

Since AnaGram takes care of all the complex parsing logic, you'll get programs up and running quickly. You won't need all that heavily nested conditional logic where bugs love to hide.

What's more, you'll find that program maintenance becomes much easier. Not only can you work from a description, but certain of AnaGram's special features make it particularly easy to modify grammars. When the time comes to make improvements to your programs, AnaGram will still be helping you.

AnaGram has an interactive environment equipped with unique, highly capable tools for visual debugging, viewing and testing of grammars and parsers. Now, you can put the powerful technique of syntax directed parsing to work, to speed development of projects both large and small.

The C/C++ parser module generated by AnaGram from your description can be compiled on virtually any platform. By incorporating the parser into a DLL, you can even use it in Delphi or Visual Basic programs.

About AnaGram 2.01:

AnaGram 2.01 runs on Win32 platforms: Win9x,ME,NT,2000. AnaGram generates ANSI compatible platform independent C or C++ code. No runtime libraries are required. A single user commercial license costs US$495 including shipping. Contact Parsifal for further information.

Use AnaGram For: keyboard input, database queries, script languages, configuration files, file translators, domain specific languages, interpreters, compilers...

More Information:

AnaGram Examples

----------------------

For more information:

e-mail: info@parsifalsoft.com

Parsifal Software
P.O. Box 219
Wayland, MA 01778

 

User comments about AnaGram...

www.parsifalsoft.com
1-800-879-2577
Voice/Fax: 1-508-358-2564

----------------------

Parsifal SoftwareA few puns, in parsing...

Website comments? webmaster@parsifalsoft.com

 

Top down parsing is considerably simpler than LALR parsing…


 

Top Down Parsing of Context Free Grammars

Each rule has its own function (method)

Here is a grammar for arithmetic expressions (after Sedgewick):

<expression>::=<term> | <term>+<expression>

<term>::=<factor> | <factor> * <term>

<factor>::=(<expression>) | <letter>

 

Recall the (somewhat) similar grammar describing all regular expressions

(adapted from Sedgewick)

 

<expression>::=<term> | <term>OR<expression>

<term>::=<factor> | <factor> <term>

<factor>::=(<expression>) | v | <factor> *

 


“Top down:”

Start from the top-level symbol:

<expression>

Try to satisfy its requirements

      Succeed: yes, it is a regular expression

      Fail: no, it is not a regular expression

Regular expression example:
( a * b OR a c )

Note that the parse provides a natural accounting of rules of precedence: recall

<expression>::=<term> | <term>OR<expression>

<term>::=<factor> | <factor> <term>

<factor>::=(<expression>) | v | <factor> *

 

Here is the function for <expression>: (adapted from Sedgewick)


The function for <expression>:

 

expression( ){

   term( )

   if (current token is OR)

      move to next token

   expression( )

}

 

Recall:

<expression>::=<term> | <term>OR<expression>

Here is the function for <term>

term( ){

   factor( )

   if (current token is `(`

       or is a letter

      )

      term( )

}

Recall:

<term>::=<factor> | <factor> <term>

<factor>::=(<expression>)|v|<factor> *


Here is the function for <factor>

factor( ){

   if (current token

       is `(` ){

      move to next token

      expression( )

      if (current token

          is `)`)

      move to next character

      else parse fails

   }else

      if (current token

          is a letter)

      move to next token

      <