Writing extractor code

Write your extractor code by using Annotation Query Language (AQL).

About this task

AQL is a declarative and expressive language that offers a comprehensive set of built-in extraction primitives. These extraction primitives range from feature extraction primitives such as regular expressions, dictionaries and part-of-speech to set-level primitives such as union, join, and aggregation for manipulating and combining these features.

The top-level components of an AQL extractor are its views , logical statements that define but do not necessarily compute a set of tuples. An extractor consists of a collection of views, each of which defines a relation. Some of these views are designated as output views , while others are non-output views . An extractor is sometimes referred to as annotator. It is an information extraction program that extracts structured information from unstructured or semistructured text by using AQL constructs.

Even though AQL is a powerful language, you might want to perform operations on extracted values that are not supported by AQL. For example, you want to normalize a value by converting it to lowercase and removing multiple consecutive white spaces or make a web-service call to validate an extracted credit card number. To enable this, AQL provides custom defined functions to be used in extraction rules, called user-defined functions (UDFs). UDF JAR files are JAR files that contain these UDF classes.

Procedure

  1. Use the AQL Reference to write AQL code in the AQL Editor.
  2. InfoSphere BigInsights Text Analytics tools can develop specific aspects of your AQL code, including: