A labeled data collection is the standard way to evaluate the accuracy and completeness of an extractor.
Labeled data collections are supported by Text Analytics for all data collection formats except for CSV with header and JSON.
You can create a labeled data collection from the results of an extractor or from a data collection.
| To create a labeled data collection from: | Perform the following steps: |
|---|---|
| A data collection |
|
| Extracted results |
Tip: If you run an extractor and
it produces reasonably good results, consider creating a labeled
data collection from the extracted results for later use.
|
You must label the text in the data collection so that the Text Analytics system can compare the labeled text with the extractor results. Based on this comparison, the Text Analytics system computes labeled collection measures, which you can use to evaluate the quality of your extractor.