Write your extractor code by using Annotation Query Language (AQL).
AQL is a declarative and expressive language that offers a comprehensive set of built-in extraction primitives. These extraction primitives range from feature extraction primitives such as regular expressions, dictionaries and part-of-speech to set-level primitives such as union, join, and aggregation for manipulating and combining these features.
The top-level components of an AQL extractor are its views , logical statements that define but do not necessarily compute a set of tuples. An extractor consists of a collection of views, each of which defines a relation. Some of these views are designated as output views , while others are non-output views . An extractor is sometimes referred to as annotator. It is an information extraction program that extracts structured information from unstructured or semistructured text by using AQL constructs.
Even though AQL is a powerful language, you might want to perform operations on extracted values that are not supported by AQL. For example, you want to normalize a value by converting it to lowercase and removing multiple consecutive white spaces or make a web-service call to validate an extracted credit card number. To enable this, AQL provides custom defined functions to be used in extraction rules, called user-defined functions (UDFs). UDF JAR files are JAR files that contain these UDF classes.