Troubleshooting

Table 1. Problems and corrections
Problem number Problem Likely cause Correction
1 The Text Analytics engine generates incorrect token boundaries. An incorrect language value was set in the launch configuration. Confirm the language of the data collection, and check the value in the launch configuration.
2 The highlighting for span values in the Result Tree View, Labeled Document Collection Viewer, or File Side-by-Side Differences Viewer is off by a few characters. The text file encoding of the InfoSphere® BigInsights™ project is not set to UTF-8. Set the text file encoding of the InfoSphere BigInsights project to UTF-8:
  1. In the Project Properties, go to the Resources page.
  2. On the Resources page, select UTF-8 as the text file encoding.
3 In the InfoSphere BigInsights Tools for Eclipse, an out-of-memory error (OutOfMemoryError) occurs when you run your extractor on a data collection or navigate through the extracted results. Because the Text Analytics system has a document-at-a-time execution model, the memory utilization depends on the number of results that are generated for the largest document when you run an extractor in the Text Analytics runtime component or call Text Analytics APIs. However, when you run a Text Analytics extractor in the InfoSphere BigInsights Tools for Eclipse, all the results for all documents in the data collection are loaded in main memory so they can be displayed in the Annotation Explorer and the other result viewers. Choose a smaller data collection to test your extractor, or increase the maximum heap size for your Eclipse application. To increase the heap size:
  1. In your Eclipse installation directory, open the eclipse.ini file.
  2. Modify the value of the -Xmx VM argument, and save the file.
  3. Restart Eclipse.
4 The Annotation Explorer displays zero results for some extractors. The extractor results are non-span values and therefore are not shown in the Annotation Explorer. Use the Result Table Viewers to examine the non-span values. No action is required.
5 The Annotation Explorer displays empty spans of zero length such as [0-0]. When the span is opened:
  • An empty document is displayed in the Result Editor with Anonymous as the source text object from which the value is extracted.
  • The Result Tree View is opened and Anonymous is displayed in the title bar to indicate the source text object from which the value is extracted.
The extractor contains AQL code in which empty spans are created, for example:
								-- Identify matches of a 
-- dictionary of common 
-- first names
create view FirstName as 
 extract dictionary 
    'FirstNamesDict' 
     on D.text as name 
  from Document D;

							
								-- Identify matches of a 
-- dictionary of common last
-- names 
create view LastName as 
  extract dictionary 
  'LastNamesDict' 
   on D.text as name 
   from Document D;

							
								-- Identify matches of a 
-- dictionary of common 
-- salutations, for example,
--  'Mr.', 'Ms.', etc
create view Salutation 
  as 
  extract dictionary 
   'SalutationDict' 
  on D.text as name
 from Document D;

							
								-- Person candidate 1: 
-- identify first name 
-- immediately followed 
-- within 0 tokens by a last
-- name
create view PeopleNames1 
  as 
  select F.name as fName, 
    L.name as lName 
   from FirstName F, 
     LastName L 
 where FollowsTok
  (F.name, L.name, 0, 0);

							
								-- Person candidate 2: 
-- last names immediately 
-- preceded by a salutation
create view PeopleNames2 
  as
 select '' as fName, 
    L.name as lName
 from Salutation S, 
  LastName L
 where FollowsTok
   (S.name, L.name, 0, 0);

							
								-- Final view of PeopleNames
-- as a union of 
-- PeopleNames1 and 
-- PeopleNames2
create view PeopleNames 
  as 
(select P.fName as fName, 
  P.lName as lName 
  from PeopleNames1 P) 
 union all 
(select P.fName as fName, 
  P.lName as lName 
  from PeopleNames2 P);

output view PeopleNames;
							
In the example, empty text is created in PeopleNames2 for the fName field to indicate NULL or empty values for this field. The attributes in PeopleNames1, however, are not empty. Later, a union all AQL construct unifies the tuples from PeopleNames1 and PeopleNames2 to create the final output view, PeopleNames. The 2 views are not union-compatible in a strict sense. For example, the first view returns tuples whose first column is of type Span over Document.text, while the second view returns tuples whose first column is of type Text. However, the Text Analytics run time is not strict in this case and allows such unions, since using empty values to indicate null values is a convenient feature.

When the extractor that contains this kind of AQL code is run, there are empty span values of zero length [0-0] for the PeopleNames.fName . These empty spans correspond to the empty text created in PeopleNames2 . When one of these spans is opened in the Result Editor, an empty document is opened and contains Anonymous in the title. This title indicates that the value is a span over text created in the AQL code and it is not a Span over Document.text or a Span over other text that is derived from the input document text, for example by using the AQL detag statement.

For more information, see the AQL Reference in the Information Center.

6 The Text Analytics Indexer is corrupted, the new index files must be created, and refactoring and AQL Doc Hover does not work. Text Analytics Indexer can get corrupted in the following scenarios:
  • When you rename a project, module, or file while the workspace is being indexed, the following warning message is shown: The tooling workspace is being indexed. If you continue, the indexing might not be successful and refactoring feature won't work further. Cancel it and try your action again. In this case, if you continue, the indexer gets corrupted.
  • When you add or delete .aql files, or update .aql file content in quick succession before the indexer indexes the previous changes.
Generate new index files for the InfoSphere BigInsights Tools for Eclipse workspace:
  1. Close the InfoSphere BigInsights Tools for Eclipse workspace.
  2. Delete all of the existing index files (.idx) in WorkspaceRoot\ .metadata\.plugins\com.ibm.biginsight.textanalytics.indexer
  3. Restart the InfoSphere BigInsights Tools for Eclipse workspace.