In some cases, the Text Analytics optimizer cannot choose the most efficient execution plan. When that is the case, you can use the AQL Profiler to identify which portion of the plan can be improved through manual tuning. The AQL Profiler is included in the InfoSphere® BigInsights™ Eclipse Tooling.
The Profiler generates reports that can help you troubleshoot performance problems in the AQL code. For more information, see Improving extractor performance for help in determining if your extractor requires performance tuning, and how to solve common performance problems.
The Profiler also calculates the throughput of the extractor (in KB/seconds) by dividing the size of the data that was processed by the total duration of the Profiler execution. The higher the value, the better: more data is processed per second, and the extractor is faster. As you tune the AQL hot spots identified by the Profiler, verify that the throughput value increases the next time you rerun the Profiler. You may need to repeat the process until you are satisfied with the extractor performance.
The Profiler generates the following reports that can help you troubleshoot performance problems in the AQL code.
Helps you understand which of the views took the longest time to run. For a moderately complex extractor, a view that takes more than 5% of the execution time is a potential hot spot. (In general, you can ignore the time taken by 'Tokenization and POS Tagging', which is expected to take a longer time in general.)
Displays the normalized running time per MB of text for documents larger than 1 KB. A higher number means the extractor is slower on that document. This report also displays the extractor throughput, the higher the throughput, the faster the extractor.