Announcements

Monday, meet in Coover 1318 to observe and help assess demos

 

Recall:

 

 

 

   Models for Reader Interaction Systems   


Daniel Berleant

Iowa State University

Ames, Iowa 50011

berleant@iastate.edu

 

 

 

 

 

 

 

 

Some Relevant Previous Works


  • J.A. Waterworth and M.H. Chignell
      A model for information exploration
         Hypermedia 3, 1 (1991), 35-58
  • D. Ellis
      A behavioural approach to information retrieval system design
         Journal of Documentation, 45 (1989), 171-212
  • B. Shneiderman, D. Byrd, W.B. Croft
      Sorting out searching: A user-interface framework for text searches 
         Commun. ACM 41, 4 (April 1998), 95-98
  • I found these particularly useful
       . . . let me know if you have another favorite

 

 

 

 

 

 

 

 

Outline from Here


  • The Models 
  • A Framework for Organizing Models of Reader Interaction Systems 
  • Hybrid Systems Based on Two or More Models
  • Conclusion

 

 

 Ten Models  


  • The Hypermedia model
  • The Book model
  • The Newspaper model
  • The Citation model
  • The Reuse model
  • The Directory model
  • The Query model
  • The Text Mining model
  • The Composition model     
  • The Index model

 

 

 

   Hybrid Systems    


  • A hybrid system uses two or more models
    • we focus on models from separate leaves 

·         There are numerous potential hybrid systems

  # Models in hybrid  

  # Systems  

2

39

3

74

4

68

5

24

Total

205

  • Some already exist
  • Others are novel to a greater or lesser degree
  • (The paper lists 39)

 

 

 

 

 

 

 

 

 

 

 Some Hybrid Systems


  1. Book and Hypermedia: long HTML documents could benefit from page numbers and a clickable automatically generated index at the end
  2. Book and Query: divide long on-line documents into page-sized files and index with a search engine. Use Web browser and access a search engine to browse it. See http://class.ee.iastate.edu/berleant/home/Research/DWD/ WebBookPager/index.htm
  3. Newspaper, Query, and Index: automatically query a search engine nightly. URLs of any documents that are changed, new, or deleted since the previous night are emailed to the subscriber
  4. Citation and Hypermedia: references at the end of a scholarly paper could profitably state where in the paper they were cited. Example: see this paper in the CIKM proceedings
  5. Directory and Citation: Science Citation Index and Social Science Citation Index are traditional examples; ResearchIndex is on-line, free, and even presented at this CIKM 

 

 

 

 

 

 

REFERENCES

[1]  Benest, I.D. A hypertext system with controlled 
       hype. In McAleese, R. and Green, C. (eds.). 
       Hypertext: State of the Art. Intellect Books, 
       Oxford, 1990.                                     (Cited in Sec. 3.2)
[2]  Berghel, H., Berleant, D., Foy, T., and  McGuire, M. 
       Cyberbrowsing: Information customization on the Web. 
       Journal of the American Society for Information 
       Science (JASIS), 50, 10 (May 1999), 505-511. 
                                        (Cited in Sec’s. 2, 3.1, & 4 item 6)
[3]  Bollacker, K., Lawrence, S., and Giles, C.L. 
       CiteSeer: An autonomous Web agent for automatic 
       retrieval and identification of interesting 
       publications. In Proceedings of the Second 
       International ACM Conference on Autonomous Agents, 
       ACM Press, 1998.                    (Cited in Sec. 4 item 32)
       

 

 

 

 

 

 

 

Today

 

Finish parsing

 

Parsing

Parsing means checking if a string is

      grammatical (=matches the pattern)

Does not address meaning of the symbols

Example 1:
A compiler has a parser (“front end”)

…and…
a code generator that assigns “meaning” to strings in the language (“back end”)

Example 2:
B*C is a string…

matching the “arithmetic expression” pattern

…or matching the regular expression pattern   

But its meaning is undefined

It could be a product

It could be a repetition and concatenation

Nondeterministic FSMs can parse regular expressions

…but they can’t parse context free grammars

…so what can we do?

 

How to parse context free grammars

Several methods exist

From simple to complex

From slow to fast

From slow and simple, to fast and complex

      Why? The “tradeoff” concept

(So why no slow and complex methods?)

(Why no fast and simple methods?)

Anyway…

Simple and slow is often best

Top down parsing is simplest

Bottom up parsing is less simple

LALR parsing is even less simple

LALR
= Look Ahead Left Recursive”


Example Web site

CHAPTER 19

LALR(1) Grammar


This chapter presents a grammar for Java. The grammar has been mechanically checked to insure that it is LALR(1).

The grammar for Java presented piecemeal in the preceding chapters is much better for exposition, but it cannot be parsed left-to-right with one token of lookahead because of certain syntactic peculiarities, some of them inherited from C and C++. These problems and the solutions adopted for the LALR(1) grammar are presented below, followed by the grammar itself.

 


Notes:

LALR(1) is a subtype of LALR

The parsing area has many complexities…

…so take a compiler course!

 

Example Web site #2:

http://www.eng.auburn.edu/users/wenchen/course/6210/week10/1.gif

As mentioned, there are many complexities

 

Example Web site #3:

http://www.parsifalsoft.com/

----------------------

PARSIFAL SOFTWARE

Trial Copy
Glossary
Example
Freeware
Introduction
Contact Parsifal

 AnaGram
 LALR(1) Parser Generator
Now shipping: AnaGram 2.01 for Windows


 

 

Download Trial Copy for Windows

----------------------

 *new Creating thread-safe parsers with AnaGram

 *new More support for C++ parsers

The AnaGram Parser Generator

Here's how it works: You create a set of rules, a grammar, that describes the input to your computer program. You can try the grammar right away interactively, without writing code, using AnaGram's to File Trace pageFile Trace and to Grammar Trace pageGrammar Trace. Then attach C or C++ code to the rules, and tell AnaGram to build a parser for you.

The parser is a C/C++ function that parses text according to the rules in your grammar and, as it matches rules, calls your code to deal with them. Using a grammar means you get faster development, easier modification and maintainability, and fewer bugs in your software. You use an easy-to-understand description of input instead of bug-prone, fragile, branching code. You make the rules, and AnaGram makes a parser for your input.

AnaGram Makes It Easy!

AnaGram is explicitly designed to be easy to use, with a powerful notation for representing grammars, so you can write grammars fast.

Since AnaGram takes care of all the complex parsing logic, you'll get programs up and running quickly. You won't need all that heavily nested conditional logic where bugs love to hide.

What's more, you'll find that program maintenance becomes much easier. Not only can you work from a description, but certain of AnaGram's special features make it particularly easy to modify grammars. When the time comes to make improvements to your programs, AnaGram will still be helping you.

AnaGram has an interactive environment equipped with unique, highly capable tools for visual debugging, viewing and testing of grammars and parsers. Now, you can put the powerful technique of syntax directed parsing to work, to speed development of projects both large and small.

The C/C++ parser module generated by AnaGram from your description can be compiled on virtually any platform. By incorporating the parser into a DLL, you can even use it in Delphi or Visual Basic programs.

About AnaGram 2.01:

AnaGram 2.01 runs on Win32 platforms: Win9x,ME,NT,2000. AnaGram generates ANSI compatible platform independent C or C++ code. No runtime libraries are required. A single user commercial license costs US$495 including shipping. Contact Parsifal for further information.

Use AnaGram For: keyboard input, database queries, script languages, configuration files, file translators, domain specific languages, interpreters, compilers...

More Information:

AnaGram Examples

----------------------

For more information:

e-mail: info@parsifalsoft.com

Parsifal Software
P.O. Box 219
Wayland, MA 01778

 

User comments about AnaGram...

www.parsifalsoft.com
1-800-879-2577
Voice/Fax: 1-508-358-2564

----------------------

Parsifal SoftwareA few puns, in parsing...

Website comments? webmaster@parsifalsoft.com

 

Top down parsing is considerably simpler than LALR parsing…


 

Top Down Parsing of Context Free Grammars

Each rule has its own function (method)

Here is a grammar for arithmetic expressions (after Sedgewick):

<expression>::=<term> | <term>+<expression>

<term>::=<factor> | <factor> * <term>

<factor>::=(<expression>) | <letter>

 

Recall the (somewhat) similar grammar describing all regular expressions

(adapted from Sedgewick)

 

<expression>::=<term> | <term>OR<expression>

<term>::=<factor> | <factor> <term>

<factor>::=(<expression>) | v | <factor> *

 


“Top down:”

Start from the top-level symbol:

<expression>

Try to satisfy its requirements

      Succeed: yes, it is a regular expression

      Fail: no, it is not a regular expression

Regular expression example:
( a * b OR a c )

Note that the parse provides a natural accounting of rules of precedence: recall

<expression>::=<term> | <term>OR<expression>

<term>::=<factor> | <factor> <term>

<factor>::=(<expression>) | v | <factor> *

 

Here is the function for <expression>: (adapted from Sedgewick)


The function for <expression>:

 

expression( ){

   term( )

   if (current token is OR)

      move to next token

   expression( )

}

 

Recall:

<expression>::=<term> | <term>OR<expression>

Here is the function for <term>

term( ){

   factor( )

   if (current token is `(`

       or is a letter

      )

      term( )

}

Recall:

<term>::=<factor> | <factor> <term>

<factor>::=(<expression>)|v|<factor> *


Here is the function for <factor>

factor( ){

   if (current token

       is `(` ){

      move to next token

      expression( )

      if (current token

          is `)`)

      move to next character

      else parse fails

   }else

      if (current token

          is a letter)

      move to next token

      else parse fails

   if (current token is `*`)

      move to next token    

}

Recall:

<factor>::=(<expression>)|v|<factor> *

Note: some context free grammars go into infinite loops when parsed this way (use a more complicated way in those cases)

Context free grammar type rules are powerful

They can make a programming language!

            Prolog (see figure)

 

Generating Strings is Easy

We could use the regular expression grammar to generate a regular expression

We could then use the regular expression to generate a string

Let’s try…

Parsing strings can entail false starts, as we’ve seen

An ultimately successful result is called a parse tree

(see Fig., Sedgewick p. 271)

 

Compilers

      Basic idea: translate between languages

            (Generally  to a simpler one)

      Examples:

Java to Java byte code

            C to Intel chip machine language

            Regular expressions to nFSMs

Compiling = parsing + code generation

      Example of code generation:

      For the arithmetic expression parser

            If ‘+’

then issue a 8086 chip instruction for adding

      (same instruction should work on

modern x86 chips as well)

 

Another example of code generation

      For the regular expression functions:

            Improve them to output FSM chunks

                  Either pictures or pieces of tables

                                  (See Sedgewick for details)

      For a particular regular expression

            Check a string

Output a message (the “code”) at each state

(See figure, Sedgewick p. 260)

 

 

 

 

Going the other way – “decompilation”

      Decompilation is a harder problem

      Consider for example:

      Two different C programs could have

the same machine code

      How could this happen?

Clearly, perfect decompilation must be impossible

 

 

 

 

 

 

 

 

Compiler-Compilers

First, some vocabulary…

Some Terms Related to Parsing

A language is a set of strings

A grammar is a set of rules describing a language

(We’ve called both of these “patterns”)

 

 

 

 

 

 

 

 

 

Example Languages

Example 1:
The set of all regular expressions

   We saw how to describe this language

             using a context free grammar

  containing three rules

1.  <expression>::=
<term> | <term>OR<expression>

2.  <term>::=<factor>|<factor> <term>

3.  <factor>::=(<expression>)|v|<factor>*

 

Note: ‘v’ is any character

Example 2:
The strings matching a given regular expression

    (e.g. Sedgewick p. 270)

 

 

Examples 3:
N open parens followed by N close parens

All balanced parenthesis sequences

All balanced delimiter sequences

Example 4:
C

Put another way,

“The set of all strings that are valid C programs”

Examples 5-7: Pascal, Java, Perl

 

What is a Compiler-Compiler?

A grammar describes a set of strings

A regular expression is a simple grammar

A context-free grammar is more powerful

A string can be parsed

A parser can output new stuff

– it is a compiler that does “compilation”

Recall: compiler=parser+code generator

A parser generator parses a grammar

and translates it into a parser

E.g. give it rules and it outputs software

If a parser is a kind of compiler…

and a parser generator creates parsers

A parser generator is

a compiler-compiler

(because it compiles grammars into compilers)

I.e. it translates grammars into compilers

(it “compiles grammars into compilers”)