Converts the DataBag back into a scala Seq.
Converts the DataBag back into a scala Seq. Warning: Do not call this on DataBags that are too large to fit on one machine!
The contents of the DataBag as a scala Seq.
Removes duplicate entries from the bag, e.g.
Removes duplicate entries from the bag, e.g.
DataBag(Seq(1,1,2,3)).distinct = DataBag(Seq(1,2,3))
A version of this DataBag without duplicate entries.
Monad flatMap.
Monad flatMap.
Type of the output DataBag.
Function to be applied on the collection. The resulting bags are flattened.
A DataBag containing the union (flattening) of the DataBags f(x) produced for each element of the input.
Structural recursion over the bag.
Structural recursion over the bag. Assumes an algebraic specification of the DataBag type using three constructors:
sealed trait DataBag[A] case class Sng[A](x: A) extends DataBag[A] case class Union[A](xs: DataBag[A], ys: Bag[A]) extends DataBag[A] case object Empty extends DataBag[Nothing]
The function then denotes the following recursive computation:
this match { case Empty => agg.zero case Sng(x) => agg.init(x) case Union(xs, ys) => p(xs.fold(agg), ys.fold(agg)) }
The algebra parameterizing the recursion scheme.
Groups the bag by key.
Groups the bag by key.
Key type.
Key selector function.
A version of this bag with the entries grouped by key.
Monad map.
Monad map.
Type of the output DataBag.
Function to be applied on the collection.
A DataBag containing the elements f(x) produced for each element x of the input.
Creates a sample of up to k elements using reservoir sampling initialized with the given seed.
Creates a sample of up to k elements using reservoir sampling initialized with the given seed.
If the collection represented by the DataBag instance contains less then n elements,
the resulting collection is trimmed to a smaller size.
The method should be deterministic for a fixed DataBag instance with a materialized result.
In other words, calling xs.sample(n)(seed) two times in succession will return the same result.
The result, however, might vary between program runs and DataBag implementations.
Union operator.
Union operator. Respects duplicates, e.g.:
DataBag(Seq(1,1,2,3)) plus DataBag(Seq(1,2,5)) = DataBag(Seq(1,1,2,3,1,2,5))
The second addend parameter.
The set-theoretic union (with duplicates) between this DataBag and the given one.
Monad filter.
Monad filter.
Predicate to be applied on the collection. Only qualifying elements are passed down the chain.
Writes a DataBag into the specified path in a CSV format.
Writes a DataBag into the specified path in a CSV format.
The location where the data will be written.
The CSV format configuration
A converter to use for element serialization.
Writes a DataBag into the specified path in a CSV format.
Writes a DataBag into the specified path in a CSV format.
The location where the data will be written.
The CSV format configuration
A converter to use for element serialization.
Writes a DataBag into the specified path as plain text.
Writes a DataBag into the specified path as plain text.
The serialization logic is backend-specific.
The location where the data will be written.
Zips the elements of this collection with a unique dense index.
Zips the elements of this collection with a unique dense index.
The method should be deterministic for a fixed DataBag instance with a materialized result.
In other words, calling xs.zipWithIndex() two times in succession will return the same result.
The result, however, might vary between program runs and DataBag implementations.
Converts this bag into a distributed collection of type DColl[A].
Find the bottom n elements in the collection with respect to the natural ordering of the
elements' type.
Find the bottom n elements in the collection with respect to the natural ordering of the
elements' type.
number of elements to return
the implicit Ordering of elements
an ordered (ascending) List of the bottom n elements
Count the number of elements in the collection that satisfy a predicate.
Count the number of elements in the collection that satisfy a predicate.
the predicate to test against
the number of elements that satisfy p
Test if at least one element of the collection satisfies p.
Test if at least one element of the collection satisfies p.
predicate to test against the elements of the collection
true if the collection contains an element that satisfies the predicate
Finds some element in the collection that satisfies a given predicate.
Finds some element in the collection that satisfies a given predicate.
the predicate to test against
Some element if one exists, None otherwise
Delegates to fold(Alg(zero, init, plus)).
Test if all elements of the collection satisfy p.
Test if all elements of the collection satisfy p.
predicate to test against the elements of the collection
true if all the elements of the collection satisfy the predicate
Test the collection for emptiness.
Test the collection for emptiness.
true if the collection contains no elements at all
Find the largest element in the collection with respect to the natural ordering of the elements' type.
Find the largest element in the collection with respect to the natural ordering of the elements' type.
the implicit natural Ordering of the elements
Exception if the collection is empty
Find the smallest element in the collection with respect to the natural ordering of the elements' type.
Find the smallest element in the collection with respect to the natural ordering of the elements' type.
the implicit natural Ordering of the elements
Exception if the collection is empty
Tet the collection for emptiness.
Tet the collection for emptiness.
true if the collection has at least one element
Calculate the product over all elements in the collection.
Calculate the product over all elements in the collection.
implicit Numeric operations of the elements
one if the collection is empty
Shortcut for fold(z)(identity, f).
Shortcut for fold(z)(identity, f).
return type (super class of the element type)
the result of combining all elements into one
Shortcut for fold(None)(Some, Option.lift2(f)), which is the same as reducing the collection
to a single element by applying a binary operator.
Shortcut for fold(None)(Some, Option.lift2(f)), which is the same as reducing the collection
to a single element by applying a binary operator.
the result of reducing all elements into one
the number of elements in the collection
Calculate the sum over all elements in the collection.
Calculate the sum over all elements in the collection.
implicit Numeric operations of the elements
zero if the collection is empty
Find the top n elements in the collection with respect to the natural ordering of the
elements' type.
Find the top n elements in the collection with respect to the natural ordering of the
elements' type.
number of elements to return
the implicit Ordering of elements
an ordered (descending) List of the bottom n elements
An abstraction for homogeneous distributed collections.