org.allenai

pipeline

package pipeline

Visibility
  1. Public
  2. All

Type Members

  1. trait Ai2CodeInfo extends HasCodeInfo

    Reads the version number and GitHub URL from configuration file bundled into the jar.

    Reads the version number and GitHub URL from configuration file bundled into the jar. These are populated by the AI2 sbt-release plugin.

  2. trait Ai2Signature extends PipelineRunnerSupport with Ai2CodeInfo

    For convenience, case classes can mix in this single trait to implement PipelineRunnerSupport

  3. trait Artifact extends AnyRef

    Represents data in a persistent store.

  4. trait ArtifactIo[T, -A <: Artifact] extends HasCodeInfo

    Interface for defining how to persist a data type.

  5. class ArtifactStreamWriter extends AnyRef

    Class for writing that exposes a more restrictive interface than OutputStream In particular, we don't want clients to close the stream Also, we force character encoding to UTF-8.

  6. trait CachingDisabled extends CachingEnabled

  7. trait CachingEnabled extends AnyRef

  8. case class CodeInfo(className: String, buildId: String, unchangedSince: String, srcUrl: Option[URI], binaryUrl: Option[URI]) extends Product with Serializable

    Contains information about the origin of the compiled class implementing a Producer

    Contains information about the origin of the compiled class implementing a Producer

    buildId

    A version number, e.g. git tag

    unchangedSince

    The latest version number at which the logic for this class changed. Classes in which the buildIds differ but the unchangedSince field is the same are assumed to produce the same outputs when given the same inputs

    srcUrl

    Link to source (e.g. in GitHub)

    binaryUrl

    Link to binaries (e.g. in Nexus)

  9. trait ColumnFormats extends AnyRef

    Support for persisting to a column-delimited file.

    Support for persisting to a column-delimited file. Persisted object can be a case-class or Tuple. Each field of the object must be a primitive type (Int, Double, String) and will be written as a column in the output file.

  10. class DirectoryArtifact extends StructuredArtifact

    Directory of files.

  11. class FileArtifact extends FlatArtifact

    Flat file.

  12. trait FlatArtifact extends Artifact

    Generic data blob.

  13. trait HasCodeInfo extends AnyRef

  14. class LineCollectionIo[T] extends ArtifactIo[Iterable[T], FlatArtifact] with Ai2CodeInfo

    Persist a collection of string-serializable objects to a flat file, one line per object.

  15. class LineIteratorIo[T] extends ArtifactIo[Iterator[T], FlatArtifact] with Ai2CodeInfo

    Persist an iterator of string-serializable objects to a flat file, one line per object.

  16. case class Link(fromId: String, toId: String, name: String) extends Product with Serializable

    Represents dependency between Producer instances

  17. case class MavenVersionId(major: Int, minor: Option[Int] = None, incremental: Option[Int] = None, build: Option[Int] = None, qualifier: Option[String] = None) extends Comparable[MavenVersionId] with Product with Serializable

    Maven-style version id

  18. trait NoPipelineRunnerSupport extends PipelineRunnerSupport

    Producer implementations that do not need to be executed by PipelineRunner can mix in this convenience trait.

    Producer implementations that do not need to be executed by PipelineRunner can mix in this convenience trait. These methods will not be invoked if the output is retrieved by calling Producer.get instead of PipelineRunner.run

  19. case class Node(info: CodeInfo, params: Map[String, String], outputPath: Option[URI]) extends Product with Serializable

    Represents a Producer instance with PipelineRunnerSupport

  20. class PersistedProducer[T, A <: Artifact] extends Producer[T]

  21. class PipelineRunner extends pipeline.IoHelpers.ArtifactFactory[(Signature, String)]

    Executes a pipeline represented by a set of Producer instances Inspects the meta-info about the pipeline steps (represented by PipelineRunnerSupport interface) and builds a DAG representation of the pipeline.

    Executes a pipeline represented by a set of Producer instances Inspects the meta-info about the pipeline steps (represented by PipelineRunnerSupport interface) and builds a DAG representation of the pipeline. Visualizes the DAG in HTML and stores the HTML page along with the pipeline output. The output location of each pipeline step is not specified by the code that builds the pipeline. Instead, each step's output location is determined by the PipelineRunner based on the Signature of that step. This allows independent processes for pipelines with overlapping steps in their DAGs to re-use past calculations.

  22. trait PipelineRunnerSupport extends HasCodeInfo

    This information is used by PipelineRunner to construct and visualize the DAG for a pipeline

  23. trait Producer[T] extends Logging with CachingEnabled with PipelineRunnerSupport

    An individual step in a data processing pipeline.

    An individual step in a data processing pipeline. A lazily evaluated calculation, with support for in-memory caching and persistence.

  24. trait ReadHelpers extends ColumnFormats

  25. trait S3Artifact[A <: Artifact] extends Logging

  26. case class S3Config(service: AmazonS3Client, bucket: String) extends Product with Serializable

  27. class S3FlatArtifact extends FlatArtifact with S3Artifact[FileArtifact]

    Artifact implementations using S3 storage.

  28. class S3ZipArtifact extends StructuredArtifact with S3Artifact[ZipFileArtifact]

    Zip file stored in S3.

  29. case class Signature(name: String, unchangedSinceVersion: String, dependencies: Map[String, PipelineRunnerSupport], parameters: Map[String, String]) extends Product with Serializable

    Acts as an identifier for a Producer instance.

    Acts as an identifier for a Producer instance. Represents the version of the implementation class, the inputs, and the static configuration. The PipelineRunner class uses a Producer's Signature to determine the path to the output data, so two Producers with the same signature must always produce identical output.

    name

    Human-readable name for the calculation done by a Producer. Usually the class name, typically a verb

    unchangedSinceVersion

    The latest version number at which the logic for this class changed. Default is "0", meaning all release builds of this class have equivalent logic

    dependencies

    The inputs to the Producer

    parameters

    Static configuration for the Producer. Default is to use .toString for constructor parameters that are not Producer instances. If some parameters are non-primitive types, those types should have .toString methods that are consistent with .equals.

  30. class SingletonIo[T] extends ArtifactIo[T, FlatArtifact] with Ai2CodeInfo

    Persist a single object to a flat file.

  31. trait StringSerializable[T] extends AnyRef

    Serialize an object to/from a String

  32. trait StructuredArtifact extends Artifact

    Artifact with nested structure, containing multiple data blobs identified by String names.

    Artifact with nested structure, containing multiple data blobs identified by String names. Only one level of structure is supported.

  33. trait UnknownCodeInfo extends HasCodeInfo

    Represents code from an unspecified location

  34. case class Workflow(nodes: Map[String, Node], links: Iterable[Link]) extends Product with Serializable

    DAG representation of the execution of a set of Producers

  35. trait WriteHelpers extends AnyRef

  36. class ZipFileArtifact extends StructuredArtifact

    Zip file.

Value Members

  1. object CodeInfo extends Serializable

  2. object IoHelpers extends ReadHelpers with WriteHelpers

    Utility methods for Artifact reading/writing.

  3. object LineCollectionIo

  4. object LineIteratorIo

  5. object MavenVersionId extends Serializable

  6. object PipelineRunner

  7. object Producer2

  8. object Producer3

  9. object Producer4

  10. object Producer5

  11. object S3Config extends Serializable

  12. object Signature extends Serializable

  13. object SingletonIo

  14. object StreamClosingIterator

    Given a function that converts an InputStream into an Iterator, this closes the InputStream when the Iterator has been fully consumed.

  15. object StructuredArtifact

  16. object Workflow extends Serializable

Ungrouped