You begin with a goal in mind, such as
determining what sites refer to the Watson machine or conducting a
financial analysis on IBM quarterly reports. To accomplish these
goals, you must analyze data, frequently large amounts of it.
Instead, you can create criteria and patterns and use Text Analytics
to help with the analysis.
Procedure
- Create
a InfoSphere® BigInsights™ project
in InfoSphere BigInsights Tools for
Eclipse.
After you create the project, a project structure is created for
you automatically. You can see this structure in the Project
Explorer view, with the project name as the folder that contains
the necessary files to continue. Copy or drag the documents that
you want to analyze inside the textAnalytics
folder.
If you are using data that is captured from tweet feeds or other
sources that might produce a large volume of data, consider
starting the extraction development process by using a subset of
your data instead of downloading all of the data from the cluster.
From the InfoSphere BigInsights
Console, you can select the data sampling or the data subset
applications to reduce the amount of data to work with.
- If
you are not already in the InfoSphere
BigInsights Text Analytics Workflow
perspective, switch to that perspective.
- In
the InfoSphere BigInsights Text Analytics Workflow perspective, open the Extraction Tasks tab. Select the
documents that you want to analyze. Expand this step, and click Browse Workspace to find the documents.
When you select the documents, they are available for you to
analyze. Select a file, and click Open.
That file opens in the editor pane.
As with any data analysis, if you have some
familiarity with the content, you already have an idea of the
information that you want to extract.