Creating the Text Analytics extractor environment

You begin with a goal in mind, such as determining what sites refer to the Watson machine or conducting a financial analysis on IBM quarterly reports. To accomplish these goals, you must analyze data, frequently large amounts of it. Instead, you can create criteria and patterns and use Text Analytics to help with the analysis.

Procedure

  1. Create a InfoSphere® BigInsights™ project in InfoSphere BigInsights Tools for Eclipse.

    After you create the project, a project structure is created for you automatically. You can see this structure in the Project Explorer view, with the project name as the folder that contains the necessary files to continue. Copy or drag the documents that you want to analyze inside the textAnalytics folder.

    If you are using data that is captured from tweet feeds or other sources that might produce a large volume of data, consider starting the extraction development process by using a subset of your data instead of downloading all of the data from the cluster. From the InfoSphere BigInsights Console, you can select the data sampling or the data subset applications to reduce the amount of data to work with.

  2. If you are not already in the InfoSphere BigInsights Text Analytics Workflow perspective, switch to that perspective.
  3. In the InfoSphere BigInsights Text Analytics Workflow perspective, open the Extraction Tasks tab. Select the documents that you want to analyze. Expand this step, and click Browse Workspace to find the documents. When you select the documents, they are available for you to analyze. Select a file, and click Open. That file opens in the editor pane.

    As with any data analysis, if you have some familiarity with the content, you already have an idea of the information that you want to extract.