Dialogues With Documents: Customized Interactions With Texts


  • Information has been building for many decades
  • Some exciting recent developments in information distribution

    • The Internet

    • The World Wide Web

        • Built upon an internet foundation
        • Brings new importance to Vannevar Bush's original vision of hypertext

          • ("As We May Think", Atlantic Monthly, July 1945)

    • Digital Libraries
        • Typically built upon a WWW foundation
        • Breathes life into H. G. Well's vision

          • (World Brain, Doubleday, 1938)



A Pipeline for Information


  • Traditional information flow:

  • Augmented version showing Information Customization:

  • Examples of information customization

    • Bibliographic database searches
    • Interactive data visualization
      • (e.g. work at Pacific Northwest National Laboratory)

    • Computer assisted foreign language learning
      • (e.g. Berleant et al., in Computer Assisted Language Learning, 1997 no. 2)

    • Customized access to documents

  • A few more details:   Berleant and Berghel, "Customizing Information", Computer, Sept. & Oct. 1994



Customized Access to Documents I: HyperBrowser


  • Occurrences of similar words are formed into hyperlinked chains
  • Links connect "most similar sentences sharing a stem"
  • (Except - end of a chain links to beginning)
  • HyperBrowser:

    • Uses preexisting Web clients as interfaces
    • Is prescriptive

      • (Alas, hyperlinks have just one destination)

    • Is interactive
    • Users follow custom pathways through the document
    • Supports low cognitive overhead interaction
    • Has real time navigation
    • Has non-dialogue interaction style



Which Passage Matches the Query Best?


  • Let M be the current best matching passage
    M = 1st passage in document;
    for each passage P in document
      if (P matches query better than M)
         M=P;
    

  • Measuring degree of match - Algorithm 1
    (Given query Q, passage P)
    Q'=getContentWords(Q);
    P'=getContentWords(P);
    return # words of Q' in P';
    

  • Measuring degree of match - Algorithm 1a
    . . . return # words of P' in Q';
    

  • Example:
        P=Dogs are dogs.
        Q=What are some characteristics of dogs?
        Algorithm 1: match value=1
        Algorithm 1a: match value=2

  • One Solution: a more sophisticated "cosine" algorithm
  • Problems:
    • Many algorithm variations exist
    • It is not always clear which is best



An N-Gram Based Strategy


  • (Given query Q, passage P)
    Q'=getContentWords(Q);
    P'=getContentWords(P);
    Q''=getNgrams(Q',n);
    P''=getNgrams(P',n);
    return match(Q'', P'')
    
    (where match(Q'',P'') is e.g. a cosine measure)
  • getNgrams(WordList, n)
      ngrams={ };
      for each word W in WordList
          ngrams=ngrams + substrings(W,n);
    

  • substrings("seminar",4) returns
      semi, imin, mina, inar

  • We've often used n=4
  • Mu (1997) allowed users to set n



An N-Gram Based Strategy With Variable N


  • (Given query Q, passage P)
    Q'=getContentWords(Q);
    P'=getContentWords(P);
    Q''=getNgrams(Q'); // Q''=getNgrams(Q',n);
    P''=getNgrams(P'); // P''=getNgrams(P',n);
    return match(Q'', P'')
    
    (where match(Q'',P'') is e.g. a cosine measure)
  • getNgrams(WordList) // getNgrams(WordList, n)
      ngrams={ };
      for each word W in WordList
       // ngrams=ngrams + substrings(W,n);
          ngrams=ngrams
                 + substrings(W,round(0.7*length(W)));
    



Customized Access to Documents II: CyberBrowser


  • (Ref.: Berghel, Berleant, Foy, & McGuire, Journal of the American Society for Information Science (JASIS), in press)
  • List of "most important words" is automatically extracted
  • Displays chart of which sentences contain which listed words
  • Reader interactively, graphically specifies boolean query on listed words
  • Displays extract of sentences conforming to query
  • CyberBrowser:

    • Was made into a web client
      • (Most any document display system can be easily made into a web client!)

    • Is nonprescriptive (to a degree!)
    • Is interactive
    • Has rather high cognitive overhead interface
    • Supports real time navigation
    • Has dialogue-like interaction



Customized Access to Documents III: MultiBrowsers


  • Feature "multiway lookahead"
  • Displays several subwindows
  • Each subwindow contains a passage
  • Each query refreshes all subwindows with new content
  • The MultiBrowsers:

    • Included standalone, web client, and CGI versions
    • Are nonprescriptive
    • Are interactive
    • Have rather high cognitive overhead interfaces
    • Are slow
    • Support dialogue-like interaction
    • Support multiway lookahead

  • The next step: true document dialogue!





Dialogues With Documents - the Next Phase: Some Specs


  • Should have multiway lookahead

    • Multiple responses raise probability of a good one
    • . . . and more than one good one is possible
    • Fruitless queries will be less frequent
    • Hence: each user action provides more value

  • Should it have "split subwindows"?

    • Retrieved passage visible while scrolling upper half
    • Retrieved passage easy to find when scrolling lower half
    • Variations on the basic concept possible
    • Consider standard scrollable text boxes -

      • Are split subwindows really better??
      • (They may sound better but this is unproven)



Dialogues With Documents - the Next Phase: Specs II


  • Navigation must be easy and flexible

    • Useful, low cognitive overhead option:
        - Clicking on retrieved passage makes it the next query

    • Useful, expressive option:
        - Type-in box allows highly flexible queries

    • History navigation:
        - Important in web browsers . . . same for document dialogue?

    • "Sticky" subwindows retain their contents
        - The composition metaphor can be valuable



Dialogues With Documents - the Next Phase: Specs III


  • Response times must be brisk

    • An important HCI guideline - therefore:
    • Update one or more subwindows immediately
    • Retain each passage until its replacement is ready
    • Use Fast CGI protocol
    • (Or modify server via its API)

  • Displayed content must have high quality

    • Fill all subwindows - no blanks
    • Avoid duplication among subwindows
    • Have a meta-dialogue subwindow
    • Have a clickable permuted index subwindow
    • Include "sophisticated" retrieval algs.



Dialogues With Documents - the Next Phase: Design Strategy


  • System closely associated with a server
      - so it's runnable from Netscape/Microsoft browsers

  • System built from reusable software components
      - to support long term (unknowable) implementation needs

  • System uses Fast CGI protocol
      - so document can be preprocessed just once per session

  • Build index in preprocessing
      - user friendly and supports retrieval



Some Other Related Work I


  • HCI and IR and . . .
  • Automatic abstracting and extracting

    • Examined starting in '50s
    • Can involve passage retrieval
    • Can be done in a custom manner
    • Focus tends toward batch processing
    • Focus tends toward template filling

  • Electronic books

    • Book length documents are a shared concern
    • Ebooks could support document dialogue
    • Focus tends toward hardware, currently
    • This may change



Some Other Related Work II


  • Meta-dialogue

    • Examined as early as ELIZA
    • Important in CAI
    • Field currently unstructured
    • True dialogue seems to require it

  • Question answering

    • E.g. FAQ Finder and related systems
    • Similarities to document dialogue, but . . .
    • Purpose is different
    • More like web+browser than document dialogue



Dialogues With Documents - Conclusion


  • Dialogue is more engaging that reading
      - because it customizes what is presented

  • Users already scan and pick out passages
      - for online material (Nielsen, CACM, Jan. 1999)

  • Works with books or sets of web documents
      - could handle the "too many web documents returned" issue

  • Promises diverse, interesting, cross-cutting research activities
  • Complements traditional reading with a new way to interact with documents