Annotation Difference Viewer

The Annotation Difference Viewer displays a side-by-side comparison of the extracted results from the same input file. You can use the Annotation Difference Viewer to understand how modifying the AQL statements in an extractor affects the results, and how the results compare with a labeled data collection.

Collection Differences subview

In the Annotation Difference Viewer, the Collection Differences subview shows a summary of the differences between the results file in one directory and the reference files in another directory. The reference file can be another run of the extractor or a labeled collection.

Table 1 and Table 2 describe the annotation details. The Total Changed in Result row represents the total number of all types of annotations differences.

Table 1. Annotations details table: By Document tab
Column Description
Resource The name of the file. Nested are the names of the annotation types (view name and attribute name) in the file.
Total Changed in Result The total number of annotation differences between the result file and the reference file.
Missing in Result The number of times the extractor produced a result in the reference file, but not in the result file.

If the reference file is a labeled data collection, the number of times the extractor did not produce a result in the result file. The value in this column is a reflection of the extractor recall.

Spurious in Result The number of times the extractor produced a result in the result file, but not in the reference file.

If the reference file is a labeled data collection, the number of times the extractor incorrectly produced a result in the result file. The value in this column is a reflection of the extractor precision.

Overlapping in Result The number of times the extractor produced a result in the result file that overlapped with a result in the reference file, though the results are not identical.

If the reference file is a labeled data collection, the number of times the extractor produced an overlapping result in the result file. The result is also present in the labeled data collection, but the results in the result file and labeled data collection are not identical. In other words, two spans are overlapping if the overlap is partial (a complete overlap means an exact match).

Table 2. Annotations details table: By Type tab
Column Description
Resource The name of the output view. Nested beneath are the names of the input document files that contain the view.
Total Changed in Result The total number of annotation differences between the result file and the reference file.
Missing in Result The number of times the extractor produced a result in the reference file, but not in the result file.

If the reference file is a labeled data collection, the number of times the extractor did not produce a result in the result file. The value in this column is a reflection of the extractor recall.

Spurious in Result The number of times the extractor produced a result in the result file, but not in the reference file.

If the reference file is a labeled data collection, the number of times the extractor incorrectly produced a result in the result file. The value in this column is a reflection of the extractor precision.

Overlapping in Result The number of times the extractor produced a result in the result file and a result in the reference file, though the results are not identical.

If the reference file is a labeled data collection, the number of times the extractor produced a result in the result file. The result is also present in the labeled data collection, but the results in the result file and labeled data collection are not identical. In other words, two spans are overlapping if the overlap is partial (a complete overlap means an exact match).

The Collection Differences subview includes two subsections, Annotation Differences Details and the Result Explorer. The Annotation Differences Details subsection displays the annotation differences details for the two selected files. In the Result Explorer, the results for selected files are displayed.

From the Collection Differences subview, you can go to the File Differences Summary by:
  • Double-clicking OutputView.FieldName from the By Document tab.
  • Double-clicking OutputView.FieldName from the By Type tab.

    The File Differences Summary subview displays the summary for all files that are selected in the Result Explorer subsection.

File Differences Summary subview

Table 3 describes the information that is displayed in the File Differences Summary.

The result file is the file that is being compared. The reference file is the file to which the result file is compared.

If you click any row in the File Differences Summary table, the side-by-side view of the result and reference file is displayed.

Table 3. File Differences Summary table
Column Description
Missing  
Input File Path The name of the input file path.
Name of directory The directory name in which the result file is saved.
Name of directory The directory name in which the reference file is saved.
Type The type of annotation difference. The type can be any of the following types:
Unchanged in Result
This annotation is present in both the result and reference file
Spurious in Result
This annotation is present in the result file but not in the reference file.

This type of annotation is marked in orange in the table.

Missing in Result
This annotation is present in the reference file but not in the result file.

This type of annotation is marked in red in the table.

Overlapping in Result
For this annotation, the extractor produced a result in the result file that overlapped with a result in the reference file, though the results are not identical. For example, the extracted text in the result file is (607) 205-4493 (with offsets 444-457) . However, in the reference file it is 205-4493 (with offsets 449-457)
Note: The numbers in parentheses refer to the span in the input document.

This type of annotation is marked in blue in the table.

File Side-by-Side Differences subview

The File Side-by-Side Differences subview of the Annotation Difference Viewer shows the result file beside the reference file. Annotated text is highlighted and color coded to reflect the type of annotation difference that is described in Table 3.