trait StreamingWrite extends AnyRef
An interface that defines how to write the data to data source in streaming queries.
The writing procedure is:
- Create a writer factory by
#createStreamingWriterFactory(PhysicalWriteInfo), serialize and send it to all the partitions of the input data(RDD). 2. For each epoch in each partition, create the data writer, and write the data of the epoch in the partition with this writer. If all the data are written successfully, callDataWriter#commit(). If exception happens during the writing, callDataWriter#abort(). 3. If writers in all partitions of one epoch are successfully committed, callWriterCommitMessage[]). If some writers are aborted, or the job failed with an unknown reason, callWriterCommitMessage[]).
While Spark will retry failed writing tasks, Spark won't retry failed writing jobs. Users should do it manually in their Spark applications if they want to retry.
Please refer to the documentation of commit/abort methods for detailed specifications.
- Since
3.0.0
- Alphabetic
- By Inheritance
- StreamingWrite
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Abstract Value Members
-
abstract
def
abort(epochId: Long, messages: Array[WriterCommitMessage]): Unit
Aborts this writing job because some data writers are failed and keep failing when retried, or the Spark job fails with some unknown reasons, or
WriterCommitMessage[])fails.Aborts this writing job because some data writers are failed and keep failing when retried, or the Spark job fails with some unknown reasons, or
WriterCommitMessage[])fails.If this method fails (by throwing an exception), the underlying data source may require manual cleanup.
Unless the abort is triggered by the failure of commit, the given messages will have some null slots, as there may be only a few data writers that were committed before the abort happens, or some data writers were committed but their commit messages haven't reached the driver when the abort is triggered. So this is just a "best effort" for data sources to clean up the data left by data writers.
-
abstract
def
commit(epochId: Long, messages: Array[WriterCommitMessage]): Unit
Commits this writing job for the specified epoch with a list of commit messages.
Commits this writing job for the specified epoch with a list of commit messages. The commit messages are collected from successful data writers and are produced by
DataWriter#commit().If this method fails (by throwing an exception), this writing job is considered to have been failed, and the execution engine will attempt to call
WriterCommitMessage[]).The execution engine may call
commitmultiple times for the same epoch in some circumstances. To support exactly-once data semantics, implementations must ensure that multiple commits for the same epoch are idempotent. -
abstract
def
createStreamingWriterFactory(info: PhysicalWriteInfo): StreamingDataWriterFactory
Creates a writer factory which will be serialized and sent to executors.
Creates a writer factory which will be serialized and sent to executors.
If this method fails (by throwing an exception), the action will fail and no Spark job will be submitted.
- info
Information about the RDD that will be written to this data writer
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()