Optimizing extractors through profiling

The Text Analytics Optimizer analyzes an extractor to identify an efficient execution plan. These plans can be further optimized. The Text Analytics Profiler measures the throughput of an extractor as it is run against a data collection and identifies potential performance bottlenecks in the extractor. These bottlenecks indicate AQL rules that might need to be manually tuned to run faster. You can use profile configurations to identify optimizations in your extractors, and you can create and save more than one configuration to profile an extractor in the context of different data collections.

Before you begin

Procedure

  1. Right-click a Text Analytics project, and click Profile As > Profile Configurations.
  2. In the Profile Configurations window, right-click Text Analytics > New in the navigation pane. A new profile configuration is created and selected in the navigation pane, and the Main tab opens in the content pane.
    1. If the project is a modular AQL project, select the list of modules to be profiled.
    2. Select the language of the data collection.
    3. Click Browse Workspace or Browse File System to select the location of the data collection. If one or more selected modules require external view data, choose a JSON file of suitable format as explained in Data collection formats.
      CAUTION:
      In the file selection window, if you select the Show All Files check box, ensure that you select a file that conforms to one of the supported formats. Selecting a file that is not a supported format might result in undesirable results.
      Valid formats for the data collection are provided in Data collection formats.
    4. Optional: If you want the Profiler to run for longer than the default value of 60 seconds, select a Minimum time to run, which is the recommended minimum running time. Select a value that allows the Profiler to run for long enough that it can collect performance data that is commensurate with the size of the data collection.
  3. The Text Analytics run configuration has two more tabs: External Tables and External Dictionaries. If the project is a modular AQL project and contains external dictionaries or external tables, they are listed in these tabs. You can pass data to the external artifacts that are declared in the extractor. For more information, see the External tables and External dictionaries tabs in the Text Analytics run configuration.
  4. Click Profile to profile the AQL code. For the duration specified in Minimum time to run, the Profiler runs through the entire data collection multiple times, and the results are displayed in the Profiler View.
    Note: The Profiler never stops running in the middle of a data collection. Therefore, even if the Minimum time to run value is elapsed, the Profiler continues to run through the entire data collection.
  5. Use the Profiler output to tune the extractor. For details about the Profiler output and how to tune the extractor, see AQL Profiler reports.