Reads the version number and GitHub URL from configuration file bundled into the jar.
For convenience, case classes can mix in this single trait to implement PipelineRunnerSupport
Represents data in a persistent store.
Interface for defining how to persist a data type.
Class for writing that exposes a more restrictive interface than OutputStream In particular, we don't want clients to close the stream Also, we force character encoding to UTF-8.
Contains information about the origin of the compiled class implementing a Producer
Contains information about the origin of the compiled class implementing a Producer
A version number, e.g. git tag
The latest version number at which the logic for this class changed. Classes in which the buildIds differ but the unchangedSince field is the same are assumed to produce the same outputs when given the same inputs
Link to source (e.g. in GitHub)
Link to binaries (e.g. in Nexus)
Support for persisting to a column-delimited file.
Support for persisting to a column-delimited file. Persisted object can be a case-class or Tuple. Each field of the object must be a primitive type (Int, Double, String) and will be written as a column in the output file.
Directory of files.
Flat file.
Generic data blob.
Persist a collection of string-serializable objects to a flat file, one line per object.
Persist an iterator of string-serializable objects to a flat file, one line per object.
Represents dependency between Producer instances
Maven-style version id
Producer implementations that do not need to be executed by PipelineRunner can mix in this convenience trait.
Producer implementations that do not need to be executed by PipelineRunner can mix in this convenience trait. These methods will not be invoked if the output is retrieved by calling Producer.get instead of PipelineRunner.run
Represents a Producer instance with PipelineRunnerSupport
Executes a pipeline represented by a set of Producer instances Inspects the meta-info about the pipeline steps (represented by PipelineRunnerSupport interface) and builds a DAG representation of the pipeline.
Executes a pipeline represented by a set of Producer instances Inspects the meta-info about the pipeline steps (represented by PipelineRunnerSupport interface) and builds a DAG representation of the pipeline. Visualizes the DAG in HTML and stores the HTML page along with the pipeline output. The output location of each pipeline step is not specified by the code that builds the pipeline. Instead, each step's output location is determined by the PipelineRunner based on the Signature of that step. This allows independent processes for pipelines with overlapping steps in their DAGs to re-use past calculations.
This information is used by PipelineRunner to construct and visualize the DAG for a pipeline
An individual step in a data processing pipeline.
An individual step in a data processing pipeline. A lazily evaluated calculation, with support for in-memory caching and persistence.
Artifact implementations using S3 storage.
Zip file stored in S3.
Acts as an identifier for a Producer instance.
Acts as an identifier for a Producer instance. Represents the version of the implementation class, the inputs, and the static configuration. The PipelineRunner class uses a Producer's Signature to determine the path to the output data, so two Producers with the same signature must always produce identical output.
Human-readable name for the calculation done by a Producer. Usually the class name, typically a verb
The latest version number at which the logic for this class changed. Default is "0", meaning all release builds of this class have equivalent logic
The inputs to the Producer
Static configuration for the Producer. Default is to use .toString for constructor parameters that are not Producer instances. If some parameters are non-primitive types, those types should have .toString methods that are consistent with .equals.
Persist a single object to a flat file.
Serialize an object to/from a String
Artifact with nested structure, containing multiple data blobs identified by String names.
Artifact with nested structure, containing multiple data blobs identified by String names. Only one level of structure is supported.
Represents code from an unspecified location
DAG representation of the execution of a set of Producers
Zip file.
Utility methods for Artifact reading/writing.
Given a function that converts an InputStream into an Iterator, this closes the InputStream when the Iterator has been fully consumed.
Reads the version number and GitHub URL from configuration file bundled into the jar. These are populated by the AI2 sbt-release plugin.