Discovering patterns in text input

Pattern discovery, a feature included in the InfoSphere® BigInsights™ Tools for Eclipse, identifies contextual clues within documents in a data collection that help you refine the accuracy and coverage of an extractor.

Procedure

  1. Define the text to mine for patterns by using AQL. For example, you can use the AQL rule PhoneCandidateContext to select all regions of text, or contexts, four tokens immediately preceding a phone candidate, for example:
    					create view PhoneCandidateContext as
    select LeftContextTok(P.num, 4) as context
    from PhoneCandidate P;      
    
    output view PhoneCandidateContext;   
    				
  2. Run a pattern-discovery configuration that reflects your requirements. If the configuration does not exist, create one.

Results

The Pattern Discovery View, Pattern Context View, and Expanded Context View display the results.