You can evaluate the quality of an extractor by comparing the results of one extractor run to a labeled data collection. By comparing the results, you can determine the accuracy and completeness of an extractor against a standard that you establish.
The results of the run are saved in a new subdirectory named result-system-timestamp in the results directory.
The comparison is shown in the Annotation Difference Viewer .
The precision measure is a reflection of the accuracy of your extractor. Your goal is to create an extractor that is as precise as possible.
The recall measure is a reflection of the completeness, or coverage, of your extractor.
Labeled collection details apply only when you are comparing the results of an extractor run to a labeled data collection. The measures that are described in Annotation Difference Viewer are the standard measures that are used in natural language processing.
| Column | Description |
|---|---|
| Resource | Name of
the output view. Nested beneath are the following rows:
|
| Precision | The
percentage of the results that the extractor identified as
correct that are correct according to the labeled data
collection. The higher the precision, the better the extractor
is at extracting results that are annotated in the labeled data
collection.
For example, the extractor extracted five phone numbers from an input file, but only four of them are correct based on the labeled data collection. The precision is 4/5 or 80% . |
| Recall | Compared
to the number of results that are in the labeled data
collection, the percentage of the extracted results that is
correct.
For example, according to the labeled data collection, there are 8 correct phone numbers. The extractor extracted 5 phone numbers. However, only 4 of those 5 numbers are correct. The recall is 4/8 or 50% . |
| F-Measure | A
weighted average of the precision and recall measures, computed
as
|