You can use a launch configuration to run an AQL extractor against a data collection. If you want to run the extractor against more than one collection, you can create a launch configuration for each collection and save them for future use.
After an extractor runs, one result file is generated for each input document. All result files have the .strf file extension. Result files are stored in the file system so that results can be displayed in the Text Analytics tool.
Input files that do not have annotations are not serialized and not considered for processing in the InfoSphere BigInsights Tools for Eclipse.
If the input document is a file with no special characters within a directory, then the name of the result file (.strf) is the same as the input file. However, for some data collections (such as a .ZIP file, a directory with a subdirectory, or a del file with internal labels that contain special characters), the name of the result file cannot be directly mapped to the input file name. This occurs because the label can contain certain special characters that are not allowed in a file name (such as '/', '?','%','*',':','|','<','>','\','"' ). As a result, the Text Analytics system flattens the hierarchy (in the case of a .zip directory or a subdirectory), or it normalizes the file name with special characters (in the case of del files that contain input document labels as URLs, for example, by replacing the path separators and special characters with a special character '~'). In rare situations, two input documents can have labels with the same result file name, differentiated by a version number, for example MyDoc.strf and MyDoc(1).strf. Results are also saved in a result-<system-timestamp> subdirectory in the result directory for the project. See Comparing the results of one extractor run to the results of another run to compare the performance of the results. See Evaluating the quality of an extractor by combining the labeled data collection with the Annotation Difference Viewer to evaluate the differences in data collection between results.