Pattern discovery, a feature included in the InfoSphere®
BigInsights™ Tools for Eclipse, identifies contextual clues within
documents in a data collection that help you refine the accuracy and
coverage of an extractor.
Procedure
- Define
the text to mine for patterns by using AQL. For example, you can
use the AQL rule PhoneCandidateContext
to select all regions of text, or contexts, four tokens immediately
preceding a phone candidate, for example:
create view PhoneCandidateContext as
select LeftContextTok(P.num, 4) as context
from PhoneCandidate P;
output view PhoneCandidateContext;
- Run a
pattern-discovery configuration that reflects your requirements. If
the configuration does not exist, create
one.