Dialogues With Documents: Customized Interactions With Texts

  • Information has been building for many decades

  • Some exciting recent developments in information distribution

    • The Internet

    • The World Wide Web

        • Built upon an internet foundation

        • Brings new importance to Vannevar Bush's original vision of hypertext

          • ("As We May Think", Atlantic Monthly, July 1945)

    • Digital Libraries

        • Typically built upon a WWW foundation

        • Breathes life into H. G. Well's vision

          • (World Brain, Doubleday, 1938)









A Pipeline for Information

  • Traditional information flow:

  • Augmented version showing Information Customization:

  • Examples of information customization

    • Bibliographic database searches

    • Interactive data visualization
      • (e.g. work at Pacific Northwest National Laboratory)

    • Computer assisted foreign language learning
      • (e.g. Berleant et al., in Computer Assisted Language Learning, 1997 no. 2)

    • Customized access to documents

  • A few more details:   Berleant and Berghel, "Customizing Information", Computer, Sept. & Oct. 1994








Customized Access to Documents I: HyperBrowser

  • Occurrences of similar words are formed into hyperlinked chains

  • Links connect "most similar sentences sharing a stem"

  • (Except - end of a chain links to beginning)

  • HyperBrowser:

    • Uses preexisting Web clients as interfaces

    • Is prescriptive

      • (Alas, hyperlinks have just one destination)

    • Is interactive

    • Users follow custom pathways through the document

    • Supports low cognitive overhead interaction

    • Has real time navigation

    • Has non-dialogue interaction style














Which Passage Matches the Query Best?

  • Let M be the current best matching passage
    M = 1st passage in document;
    for each passage P in document
      if (P matches query better than M)
         M=P;
    

  • Measuring degree of match - Algorithm 1
    (Given query Q, passage P)
    Q'=getContentWords(Q);
    P'=getContentWords(P);
    return # words of Q' in P';
    

  • Measuring degree of match - Algorithm 1a
    . . . return # words of P' in Q';
    

  • Example:

        P=Dogs are dogs.
        Q=What are some characteristics of dogs?

        Algorithm 1: match value=1
        Algorithm 1a: match value=2

  • One Solution: a more sophisticated "cosine" algorithm

  • Problems:
    • Many algorithm variations exist
    • It is not always clear which is best








An N-Gram Based Strategy

  • (Given query Q, passage P)
    Q'=getContentWords(Q);
    P'=getContentWords(P);
    Q''=getNgrams(Q',n);
    P''=getNgrams(P',n);
    return match(Q'', P'')
    
    (where match(Q'',P'') is e.g. a cosine measure)

  • getNgrams(WordList, n)
      ngrams={ };
      for each word W in WordList
          ngrams=ngrams + substrings(W,n);
    

  • substrings("seminar",4) returns
      semi, imin, mina, inar

  • We've often used n=4

  • Mu (1997) allowed users to set n








An N-Gram Based Strategy With Variable N

  • (Given query Q, passage P)
    Q'=getContentWords(Q);
    P'=getContentWords(P);
    Q''=getNgrams(Q'); // Q''=getNgrams(Q',n);
    P''=getNgrams(P'); // P''=getNgrams(P',n);
    return match(Q'', P'')
    
    (where match(Q'',P'') is e.g. a cosine measure)

  • getNgrams(WordList) // getNgrams(WordList, n)
      ngrams={ };
      for each word W in WordList
       // ngrams=ngrams + substrings(W,n);
          ngrams=ngrams
                 + substrings(W,round(0.7*length(W)));
    








Customized Access to Documents II: CyberBrowser

  • (Ref.: Berghel, Berleant, Foy, & McGuire, Journal of the American Society for Information Science (JASIS), in press)

  • List of "most important words" is automatically extracted

  • Displays chart of which sentences contain which listed words

  • Reader interactively, graphically specifies boolean query on listed words

  • Displays extract of sentences conforming to query

  • CyberBrowser:

    • Was made into a web client
      • (Most any document display system can be easily made into a web client!)

    • Is nonprescriptive (to a degree!)

    • Is interactive

    • Has rather high cognitive overhead interface

    • Supports real time navigation

    • Has dialogue-like interaction














Customized Access to Documents III: MultiBrowsers

  • Feature "multiway lookahead"

  • Displays several subwindows

  • Each subwindow contains a passage

  • Each query refreshes all subwindows with new content

  • The MultiBrowsers:

    • Included standalone, web client, and CGI versions

    • Are nonprescriptive

    • Are interactive

    • Have rather high cognitive overhead interfaces

    • Are slow

    • Support dialogue-like interaction

    • Support multiway lookahead

  • The next step: true document dialogue!




















Dialogues With Documents - the Next Phase: Some Specs

  • Should have multiway lookahead

    • Multiple responses raise probability of a good one

    • . . . and more than one good one is possible

    • Fruitless queries will be less frequent

    • Hence: each user action provides more value

  • Should it have "split subwindows"?

    • Retrieved passage visible while scrolling upper half

    • Retrieved passage easy to find when scrolling lower half

    • Variations on the basic concept possible

    • Consider standard scrollable text boxes -

      • Are split subwindows really better??

      • (They may sound better but this is unproven)














Dialogues With Documents - the Next Phase: Specs II

  • Navigation must be easy and flexible

    • Useful, low cognitive overhead option:

        - Clicking on retrieved passage makes it the next query

    • Useful, expressive option:

        - Type-in box allows highly flexible queries

    • History navigation:

        - Important in web browsers . . . same for document dialogue?

    • "Sticky" subwindows retain their contents

        - The composition metaphor can be valuable








Dialogues With Documents - the Next Phase: Specs III

  • Response times must be brisk

    • An important HCI guideline - therefore:

    • Update one or more subwindows immediately

    • Retain each passage until its replacement is ready

    • Use Fast CGI protocol

    • (Or modify server via its API)

  • Displayed content must have high quality

    • Fill all subwindows - no blanks

    • Avoid duplication among subwindows

    • Have a meta-dialogue subwindow

    • Have a clickable permuted index subwindow

    • Include "sophisticated" retrieval algs.














Dialogues With Documents - the Next Phase: Design Strategy

  • System closely associated with a server

      - so it's runnable from Netscape/Microsoft browsers

  • System built from reusable software components

      - to support long term (unknowable) implementation needs

  • System uses Fast CGI protocol

      - so document can be preprocessed just once per session

  • Build index in preprocessing

      - user friendly and supports retrieval








Some Other Related Work I

  • HCI and IR and . . .

  • Automatic abstracting and extracting

    • Examined starting in '50s

    • Can involve passage retrieval

    • Can be done in a custom manner

    • Focus tends toward batch processing

    • Focus tends toward template filling

  • Electronic books

    • Book length documents are a shared concern

    • Ebooks could support document dialogue

    • Focus tends toward hardware, currently

    • This may change








Some Other Related Work II

  • Meta-dialogue

    • Examined as early as ELIZA

    • Important in CAI

    • Field currently unstructured

    • True dialogue seems to require it

  • Question answering

    • E.g. FAQ Finder and related systems

    • Similarities to document dialogue, but . . .

    • Purpose is different

    • More like web+browser than document dialogue








Dialogues With Documents - Conclusion

  • Dialogue is more engaging that reading

      - because it customizes what is presented

  • Users already scan and pick out passages

      - for online material (Nielsen, CACM, Jan. 1999)

  • Works with books or sets of web documents

      - could handle the "too many web documents returned" issue

  • Promises diverse, interesting, cross-cutting research activities

  • Complements traditional reading with a new way to interact with documents