AQL Profiler reports

In some cases, the Text Analytics optimizer cannot choose the most efficient execution plan. When that is the case, you can use the AQL Profiler to identify which portion of the plan can be improved through manual tuning. The AQL Profiler is included in the InfoSphere® BigInsights™ Eclipse Tooling.

The Profiler generates reports that can help you troubleshoot performance problems in the AQL code. For more information, see Improving extractor performance for help in determining if your extractor requires performance tuning, and how to solve common performance problems.

The Profiler also calculates the throughput of the extractor (in KB/seconds) by dividing the size of the data that was processed by the total duration of the Profiler execution. The higher the value, the better: more data is processed per second, and the extractor is faster. As you tune the AQL hot spots identified by the Profiler, verify that the throughput value increases the next time you rerun the Profiler. You may need to repeat the process until you are satisfied with the extractor performance.

The Profiler generates the following reports that can help you troubleshoot performance problems in the AQL code.

Top 25 Views by Execution Time report

Helps you understand which of the views took the longest time to run. For a moderately complex extractor, a view that takes more than 5% of the execution time is a potential hot spot. (In general, you can ignore the time taken by 'Tokenization and POS Tagging', which is expected to take a longer time in general.)

These views are good candidates for manual tuning in order to improve their performance.
Remember: Not all views are equal: some will take more that 5%, some will take less. Use good judgment when you evaluate the amount of resource that a view takes.
Tip: Focus on optimizing the most expensive views.

Top 25 documents by running time report

Displays the average time to annotate each document in the data collection. This summary can provide you with some clues to isolate documents that take longer to annotate and analyze deeper why a particular type of data content takes longer time duration.
Remember: Some documents take longer to process than others because they are larger, the density of results is larger, or the document exhibits some characteristic that makes the extractor slower.

Top 25 documents over 1KB by normalized running time report

Displays the normalized running time per MB of text for documents larger than 1 KB. A higher number means the extractor is slower on that document. This report also displays the extractor throughput, the higher the throughput, the faster the extractor.