trait StagingTableCatalog extends TableCatalog
An optional mix-in for implementations of TableCatalog that support staging creation of
the a table before committing the table's metadata along with its contents in CREATE TABLE AS
SELECT or REPLACE TABLE AS SELECT operations.
It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS
SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE
TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first
drop the table via TableCatalog#dropTable(Identifier), then create the table via
StructType, Transform[], Map), and then perform
the write via SupportsWrite#newWriteBuilder(LogicalWriteInfo).
However, if the write operation fails, the catalog will have already dropped the table, and the
planner cannot roll back the dropping of the table.
If the catalog implements this plugin, the catalog can implement the methods to "stage" the
creation and the replacement of a table. After the table's
BatchWrite#commit(WriterCommitMessage[]) is called,
StagedTable#commitStagedChanges() is called, at which point the staged table can
complete both the data write and the metadata swap operation atomically.
- Since
3.0.0
- Alphabetic
- By Inheritance
- StagingTableCatalog
- TableCatalog
- CatalogPlugin
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Abstract Value Members
-
abstract
def
alterTable(ident: Identifier, changes: <repeated...>[TableChange]): Table
Apply a set of
changesto a table in the catalog.Apply a set of
changesto a table in the catalog.Implementations may reject the requested changes. If any change is rejected, none of the changes should be applied to the table.
The requested changes must be applied in the order given.
If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- changes
changes to apply to the table
- returns
updated metadata for the table
- Definition Classes
- TableCatalog
- Exceptions thrown
IllegalArgumentExceptionIf any change is rejected by the implementation.NoSuchTableExceptionIf the table doesn't exist or is a view
-
abstract
def
createTable(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): Table
Create a table in the catalog.
Create a table in the catalog.
- ident
a table identifier
- schema
the schema of the new table, as a struct type
- partitions
transforms to use for partitioning data in the table
- properties
a string map of table properties
- returns
metadata for the new table
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)TableAlreadyExistsExceptionIf a table or view already exists for the identifierUnsupportedOperationExceptionIf a requested partition transform is not supported
-
abstract
def
dropTable(ident: Identifier): Boolean
Drop a table in the catalog.
Drop a table in the catalog.
If the catalog supports views and contains a view for the identifier and not a table, this must not drop the view and must return false.
- ident
a table identifier
- returns
true if a table was deleted, false if no table exists for the identifier
- Definition Classes
- TableCatalog
-
abstract
def
initialize(name: String, options: CaseInsensitiveStringMap): Unit
Called to initialize configuration.
Called to initialize configuration.
This method is called once, just after the provider is instantiated.
- name
the name used to identify and load this catalog
- options
a case-insensitive string map of configuration
- Definition Classes
- CatalogPlugin
-
abstract
def
listTables(namespace: Array[String]): Array[Identifier]
List the tables in a namespace from the catalog.
List the tables in a namespace from the catalog.
If the catalog supports views, this must return identifiers for only tables and not views.
- namespace
a multi-part namespace
- returns
an array of Identifiers for tables
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchNamespaceExceptionIf the namespace does not exist (optional).
-
abstract
def
loadTable(ident: Identifier): Table
Load table metadata by
identifierfrom the catalog.Load table metadata by
identifierfrom the catalog.If the catalog supports views and contains a view for the identifier and not a table, this must throw
NoSuchTableException.- ident
a table identifier
- returns
the table's metadata
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchTableExceptionIf the table doesn't exist or is a view
-
abstract
def
name(): String
Called to get this catalog's name.
Called to get this catalog's name.
This method is only called after
CaseInsensitiveStringMap)is called to pass the catalog's name.- Definition Classes
- CatalogPlugin
-
abstract
def
renameTable(oldIdent: Identifier, newIdent: Identifier): Unit
Renames a table in the catalog.
Renames a table in the catalog.
If the catalog supports views and contains a view for the old identifier and not a table, this throws
NoSuchTableException. Additionally, if the new identifier is a table or a view, this throwsTableAlreadyExistsException.If the catalog does not support table renames between namespaces, it throws
UnsupportedOperationException.- oldIdent
the table identifier of the existing table to rename
- newIdent
the new table identifier of the table
- Definition Classes
- TableCatalog
- Exceptions thrown
NoSuchTableExceptionIf the table to rename doesn't exist or is a viewTableAlreadyExistsExceptionIf the new table name already exists or is a viewUnsupportedOperationExceptionIf the namespaces of old and new identifiers do not match (optional)
-
abstract
def
stageCreate(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the creation of a table, preparing it to be committed into the metastore.
Stage the creation of a table, preparing it to be committed into the metastore.
When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists when this method is called, the method should throw an exception accordingly. If another process concurrently creates the table before this table's staged changes are committed, an exception should be thrown by
StagedTable#commitStagedChanges().- ident
a table identifier
- schema
the schema of the new table, as a struct type
- partitions
transforms to use for partitioning data in the table
- properties
a string map of table properties
- returns
metadata for the new table
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)TableAlreadyExistsExceptionIf a table or view already exists for the identifierUnsupportedOperationExceptionIf a requested partition transform is not supported
-
abstract
def
stageCreateOrReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the creation or replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist when the changes are committed, the table should be created in the backing data source. This differs from the expected semantics of
StructType, Transform[], Map), which should fail when the staged changes are committed but the table doesn't exist at commit time.- ident
a table identifier
- schema
the schema of the new table, as a struct type
- partitions
transforms to use for partitioning data in the table
- properties
a string map of table properties
- returns
metadata for the new table
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)UnsupportedOperationExceptionIf a requested partition transform is not supported
-
abstract
def
stageReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: Map[String, String]): StagedTable
Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.Stage the replacement of a table, preparing it to be committed into the metastore when the returned table's
StagedTable#commitStagedChanges()is called.When the table is committed, the contents of any writes performed by the Spark planner are committed along with the metadata about the table passed into this method's arguments. If the table exists, the metadata and the contents of this table replace the metadata and contents of the existing table. If a concurrent process commits changes to the table's data or metadata while the write is being performed but before the staged changes are committed, the catalog can decide whether to move forward with the table replacement anyways or abort the commit operation.
If the table does not exist, committing the staged changes should fail with
NoSuchTableException. This differs from the semantics ofStructType, Transform[], Map), which should create the table in the data source if the table does not exist at the time of committing the operation.- ident
a table identifier
- schema
the schema of the new table, as a struct type
- partitions
transforms to use for partitioning data in the table
- properties
a string map of table properties
- returns
metadata for the new table
- Exceptions thrown
NoSuchNamespaceExceptionIf the identifier namespace does not exist (optional)NoSuchTableExceptionIf the table does not existUnsupportedOperationExceptionIf a requested partition transform is not supported
Concrete Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
def
defaultNamespace(): Array[String]
Return a default namespace for the catalog.
Return a default namespace for the catalog.
When this catalog is set as the current catalog, the namespace returned by this method will be set as the current namespace.
The namespace returned by this method is not required to exist.
- returns
a multi-part namespace
- Definition Classes
- CatalogPlugin
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
invalidateTable(ident: Identifier): Unit
Invalidate cached table metadata for an
identifier.Invalidate cached table metadata for an
identifier.If the table is already loaded or cached, drop cached data. If the table does not exist or is not cached, do nothing. Calling this method should not query remote services.
- ident
a table identifier
- Definition Classes
- TableCatalog
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
tableExists(ident: Identifier): Boolean
Test whether a table exists using an
identifierfrom the catalog.Test whether a table exists using an
identifierfrom the catalog.If the catalog supports views and contains a view for the identifier and not a table, this must return false.
- ident
a table identifier
- returns
true if the table exists, false otherwise
- Definition Classes
- TableCatalog
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()