Trait/Object

org.emmalanguage.api

DataBag

Related Docs: object DataBag | package api

Permalink

trait DataBag[A] extends Serializable

An abstraction for homogeneous distributed collections.

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataBag
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def collect(): Seq[A]

    Permalink

    Converts the DataBag back into a scala Seq.

    Converts the DataBag back into a scala Seq. Warning: Do not call this on DataBags that are too large to fit on one machine!

    returns

    The contents of the DataBag as a scala Seq.

  2. abstract def distinct: DataBag[A]

    Permalink

    Removes duplicate entries from the bag, e.g.

    Removes duplicate entries from the bag, e.g.

    DataBag(Seq(1,1,2,3)).distinct = DataBag(Seq(1,2,3))
    returns

    A version of this DataBag without duplicate entries.

  3. abstract def flatMap[B](f: (A) ⇒ DataBag[B])(implicit arg0: Meta[B]): DataBag[B]

    Permalink

    Monad flatMap.

    Monad flatMap.

    B

    Type of the output DataBag.

    f

    Function to be applied on the collection. The resulting bags are flattened.

    returns

    A DataBag containing the union (flattening) of the DataBags f(x) produced for each element of the input.

  4. abstract def fold[B](agg: Alg[A, B])(implicit arg0: Meta[B]): B

    Permalink

    Structural recursion over the bag.

    Structural recursion over the bag. Assumes an algebraic specification of the DataBag type using three constructors:

    sealed trait DataBag[A]
    case class Sng[A](x: A) extends DataBag[A]
    case class Union[A](xs: DataBag[A], ys: Bag[A]) extends DataBag[A]
    case object Empty extends DataBag[Nothing]

    The function then denotes the following recursive computation:

    this match {
      case Empty => agg.zero
      case Sng(x) => agg.init(x)
      case Union(xs, ys) => p(xs.fold(agg), ys.fold(agg))
    }
    agg

    The algebra parameterizing the recursion scheme.

  5. abstract def groupBy[K](k: (A) ⇒ K)(implicit arg0: Meta[K]): DataBag[Group[K, DataBag[A]]]

    Permalink

    Groups the bag by key.

    Groups the bag by key.

    K

    Key type.

    k

    Key selector function.

    returns

    A version of this bag with the entries grouped by key.

  6. implicit abstract def m: Meta[A]

    Permalink
  7. abstract def map[B](f: (A) ⇒ B)(implicit arg0: Meta[B]): DataBag[B]

    Permalink

    Monad map.

    Monad map.

    B

    Type of the output DataBag.

    f

    Function to be applied on the collection.

    returns

    A DataBag containing the elements f(x) produced for each element x of the input.

  8. abstract def sample(k: Int, seed: Long = 5394826801L): Vector[A]

    Permalink

    Creates a sample of up to k elements using reservoir sampling initialized with the given seed.

    Creates a sample of up to k elements using reservoir sampling initialized with the given seed.

    If the collection represented by the DataBag instance contains less then n elements, the resulting collection is trimmed to a smaller size.

    The method should be deterministic for a fixed DataBag instance with a materialized result. In other words, calling xs.sample(n)(seed) two times in succession will return the same result.

    The result, however, might vary between program runs and DataBag implementations.

  9. abstract def union(that: DataBag[A]): DataBag[A]

    Permalink

    Union operator.

    Union operator. Respects duplicates, e.g.:

    DataBag(Seq(1,1,2,3)) plus DataBag(Seq(1,2,5)) = DataBag(Seq(1,1,2,3,1,2,5))
    that

    The second addend parameter.

    returns

    The set-theoretic union (with duplicates) between this DataBag and the given one.

  10. abstract def withFilter(p: (A) ⇒ Boolean): DataBag[A]

    Permalink

    Monad filter.

    Monad filter.

    p

    Predicate to be applied on the collection. Only qualifying elements are passed down the chain.

  11. abstract def writeCSV(path: String, format: CSV)(implicit converter: CSVConverter[A]): Unit

    Permalink

    Writes a DataBag into the specified path in a CSV format.

    Writes a DataBag into the specified path in a CSV format.

    path

    The location where the data will be written.

    format

    The CSV format configuration

    converter

    A converter to use for element serialization.

  12. abstract def writeParquet(path: String, format: Parquet)(implicit converter: ParquetConverter[A]): Unit

    Permalink

    Writes a DataBag into the specified path in a CSV format.

    Writes a DataBag into the specified path in a CSV format.

    path

    The location where the data will be written.

    format

    The CSV format configuration

    converter

    A converter to use for element serialization.

  13. abstract def writeText(path: String): Unit

    Permalink

    Writes a DataBag into the specified path as plain text.

    Writes a DataBag into the specified path as plain text.

    The serialization logic is backend-specific.

    path

    The location where the data will be written.

  14. abstract def zipWithIndex(): DataBag[(A, Long)]

    Permalink

    Zips the elements of this collection with a unique dense index.

    Zips the elements of this collection with a unique dense index.

    The method should be deterministic for a fixed DataBag instance with a materialized result. In other words, calling xs.zipWithIndex() two times in succession will return the same result.

    The result, however, might vary between program runs and DataBag implementations.

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def as[DColl[_]](implicit conv: (DataBag[A]) ⇒ DColl[A]): DColl[A]

    Permalink

    Converts this bag into a distributed collection of type DColl[A].

  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def bottom(n: Int)(implicit o: Ordering[A]): List[A]

    Permalink

    Find the bottom n elements in the collection with respect to the natural ordering of the elements' type.

    Find the bottom n elements in the collection with respect to the natural ordering of the elements' type.

    n

    number of elements to return

    o

    the implicit Ordering of elements

    returns

    an ordered (ascending) List of the bottom n elements

  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def count(p: (A) ⇒ Boolean): Long

    Permalink

    Count the number of elements in the collection that satisfy a predicate.

    Count the number of elements in the collection that satisfy a predicate.

    p

    the predicate to test against

    returns

    the number of elements that satisfy p

  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(o: Any): Boolean

    Permalink
    Definition Classes
    DataBag → AnyRef → Any
  11. def exists(p: (A) ⇒ Boolean): Boolean

    Permalink

    Test if at least one element of the collection satisfies p.

    Test if at least one element of the collection satisfies p.

    p

    predicate to test against the elements of the collection

    returns

    true if the collection contains an element that satisfies the predicate

  12. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. def find(p: (A) ⇒ Boolean): Option[A]

    Permalink

    Finds some element in the collection that satisfies a given predicate.

    Finds some element in the collection that satisfies a given predicate.

    p

    the predicate to test against

    returns

    Some element if one exists, None otherwise

  14. def fold[B](zero: B)(init: (A) ⇒ B, plus: (B, B) ⇒ B)(implicit arg0: Meta[B]): B

    Permalink

    Delegates to fold(Alg(zero, init, plus)).

  15. def forall(p: (A) ⇒ Boolean): Boolean

    Permalink

    Test if all elements of the collection satisfy p.

    Test if all elements of the collection satisfy p.

    p

    predicate to test against the elements of the collection

    returns

    true if all the elements of the collection satisfy the predicate

  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. def hashCode(): Int

    Permalink
    Definition Classes
    DataBag → AnyRef → Any
  18. def isEmpty: Boolean

    Permalink

    Test the collection for emptiness.

    Test the collection for emptiness.

    returns

    true if the collection contains no elements at all

  19. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  20. def max(implicit o: Ordering[A]): A

    Permalink

    Find the largest element in the collection with respect to the natural ordering of the elements' type.

    Find the largest element in the collection with respect to the natural ordering of the elements' type.

    o

    the implicit natural Ordering of the elements

    Exceptions thrown

    Exception if the collection is empty

  21. def min(implicit o: Ordering[A]): A

    Permalink

    Find the smallest element in the collection with respect to the natural ordering of the elements' type.

    Find the smallest element in the collection with respect to the natural ordering of the elements' type.

    o

    the implicit natural Ordering of the elements

    Exceptions thrown

    Exception if the collection is empty

  22. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  23. def nonEmpty: Boolean

    Permalink

    Tet the collection for emptiness.

    Tet the collection for emptiness.

    returns

    true if the collection has at least one element

  24. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  25. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  26. def product(implicit n: Numeric[A]): A

    Permalink

    Calculate the product over all elements in the collection.

    Calculate the product over all elements in the collection.

    n

    implicit Numeric operations of the elements

    returns

    one if the collection is empty

  27. def reduce[B >: A](zero: B)(plus: (B, B) ⇒ B)(implicit arg0: Meta[B]): B

    Permalink

    Shortcut for fold(z)(identity, f).

    Shortcut for fold(z)(identity, f).

    B

    return type (super class of the element type)

    returns

    the result of combining all elements into one

  28. def reduceOption(plus: (A, A) ⇒ A): Option[A]

    Permalink

    Shortcut for fold(None)(Some, Option.lift2(f)), which is the same as reducing the collection to a single element by applying a binary operator.

    Shortcut for fold(None)(Some, Option.lift2(f)), which is the same as reducing the collection to a single element by applying a binary operator.

    returns

    the result of reducing all elements into one

  29. def size: Long

    Permalink

    returns

    the number of elements in the collection

  30. def sum(implicit n: Numeric[A]): A

    Permalink

    Calculate the sum over all elements in the collection.

    Calculate the sum over all elements in the collection.

    n

    implicit Numeric operations of the elements

    returns

    zero if the collection is empty

  31. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  32. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  33. def top(n: Int)(implicit o: Ordering[A]): List[A]

    Permalink

    Find the top n elements in the collection with respect to the natural ordering of the elements' type.

    Find the top n elements in the collection with respect to the natural ordering of the elements' type.

    n

    number of elements to return

    o

    the implicit Ordering of elements

    returns

    an ordered (descending) List of the bottom n elements

  34. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  35. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  36. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped