----------------------------------------------------------------------------------------
Test runner for Hansken Extraction Plugins

This is a utility to verify the implementation of a Hansken Extraction Plugin.

To use this utility, three components are required:
(1) A running server instance of an extraction plugin, pass the host and
    port of this server as arguments to this application;
(2) Input test data, pass the folder of the input test data as an argument to
    this application;
(3) Results (expected output), pass the folder of the results data as an
    argument to this application.

----------------------------------------------------------------------------------------
Test data

Example test data directory structure:

  tests/ 
  ├── inputs
  │   ├── example1.raw
  │   ├── example1.text
  │   ├── example1.trace
  │   ├── example2.raw
  │   └── example2.trace
  │   └── deferredExample.trace
  │   ├── deferredExample
  │   │   ├── searchtraces
  │   │   │   ├── deferredExampleSearch.trace
                       ├── deferredExampleSearch.raw
                       ├── deferredExampleSearch.text
  └── results
      ├── example1.raw.PluginName.trace
      ├── example1.text.PluginName.trace
      └── example2.raw.PluginName.trace
      └── deferredExample.raw.DeferredPluginName.trace

Input traces and data are stored in the inputs folder which contains several input sets,
where each set is a trace, and one or multiple data-streams.
Search traces belonging to deferred traces are stored in separate folders defined by
the deferred trace name.

Result (expected results) traces are stored in a separate result folder next to the
input folder. The file names in the result set correspond to the file names in the
input set. Note that the name of the plugin is added between the file basename and
the file extension. This can be useful if one maintains a single test input and output
test datasets for multiple Extraction Plugins.

The test runner will invoke the extraction plugin for each trace-datastream pair.
The test runner collects the plugin output and compares it against the trace defined
in the results folder. If there is a mismatch, the test runner will fail with an exit
code 1. If all tests pass the test runner finishes with exit code 0.

Given the files in the example above example input, the test runner will invoke the
Extraction Plugin three times: 

   Input                                           Result
1: example1.trace with data stream example1.raw    example1.raw.PluginName.trace
2. example1.trace with data stream example1.text   example1.text.PluginName.trace
2. example2.trace with data stream example2.text   example2.raw.PluginName.trace


----------------------------------------------------------------------------------------
Trace format

Input and result traces are stored in a JSON structure, where the root of the sructure
is a key `Trace`, and the value is a mapping of properties, where property names are
split in a dict structure. The following example shows a serialized trace with three
properties: `file.size`, `file.type`, and `document.author`.

{
  "trace" : {
    "file" : {
      "size" : 0,
      "type" : "raw"
    }
    "document" : {
      "author" : "me",
    }
  }
}

Result traces can have child-traces as well. These are stored in the trace under a
reserved field `children`, where `children` is a list of traces.

Data streams are passed to an extraction plugin as-is.


----------------------------------------------------------------------------------------
(re)Generate result

Given a extraction plugin and an input data set, the test runner is able to generate or
update the result set. To do so, simply start this test runner with an additional argument
`--regenerate`.

 