package catalog
Type Members
-
trait
CatalogExtension extends TableCatalog with SupportsNamespaces
An API to extend the Spark built-in session catalog.
An API to extend the Spark built-in session catalog. Implementation can get the built-in session catalog from
#setDelegateCatalog(CatalogPlugin), implement catalog functions with some custom logic and call the built-in session catalog at the end. For example, they can implementcreateTable, do something else before callingcreateTableof the built-in session catalog.- Since
3.0.0
-
class
CatalogNotFoundException extends SparkException
- Annotations
- @Experimental()
-
trait
CatalogPlugin extends AnyRef
A marker interface to provide a catalog implementation for Spark.
A marker interface to provide a catalog implementation for Spark.
Implementations can provide catalog functions by implementing additional interfaces for tables, views, and functions.
Catalog implementations must implement this marker interface to be loaded by
SQLConf). The loader will instantiate catalog classes using the required public no-arg constructor. After creating an instance, it will be configured by callingCaseInsensitiveStringMap).Catalog implementations are registered to a name by adding a configuration option to Spark:
spark.sql.catalog.catalog-name=com.example.YourCatalogClass. All configuration properties in the Spark configuration that share the catalog name prefix,spark.sql.catalog.catalog-name.(key)=(value)will be passed in the case insensitive string map of options in initialization with the prefix removed.name, is also passed and is the catalog's name; in this case, "catalog-name".- Since
3.0.0
-
abstract
class
DelegatingCatalogExtension extends CatalogExtension
A simple implementation of
CatalogExtension, which implements all the catalog functions by calling the built-in session catalog directly.A simple implementation of
CatalogExtension, which implements all the catalog functions by calling the built-in session catalog directly. This is created for convenience, so that users only need to override some methods where they want to apply custom logic. For example, they can overridecreateTable, do something else before callingsuper.createTable.- Since
3.0.0
-
trait
Identifier extends AnyRef
Identifies an object in a catalog.
Identifies an object in a catalog.
- Since
3.0.0
-
trait
NamespaceChange extends AnyRef
NamespaceChange subclasses represent requested changes to a namespace.
NamespaceChange subclasses represent requested changes to a namespace. These are passed to
SupportsNamespaces#alterNamespace. For example,import NamespaceChange._ val catalog = Catalogs.load(name) catalog.alterNamespace(ident, setProperty("prop", "value"), removeProperty("other_prop") )- Since
3.0.0
-
trait
SessionConfigSupport extends TableProvider
A mix-in interface for
TableProvider.A mix-in interface for
TableProvider. Data sources can implement this interface to propagate session configs with the specified key-prefix to all data source operations in this session.- Since
3.0.0
-
trait
StagedTable extends Table
Represents a table which is staged for being committed to the metastore.
Represents a table which is staged for being committed to the metastore.
This is used to implement atomic CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT queries. The planner will create one of these via
StructType, Transform[], Map)orStructType, Transform[], Map)to prepare the table for being written to. This table should usually implementSupportsWrite. A new writer will be constructed viaSupportsWrite#newWriteBuilder(LogicalWriteInfo), and the write will be committed. The job concludes with a call to#commitStagedChanges(), at which point implementations are expected to commit the table's metadata into the metastore along with the data that was written by the writes from the write builder this table created.- Since
3.0.0
-
trait
StagingTableCatalog extends TableCatalog
An optional mix-in for implementations of
TableCatalogthat support staging creation of the a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.An optional mix-in for implementations of
TableCatalogthat support staging creation of the a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first drop the table via
TableCatalog#dropTable(Identifier), then create the table viaStructType, Transform[], Map), and then perform the write viaSupportsWrite#newWriteBuilder(LogicalWriteInfo). However, if the write operation fails, the catalog will have already dropped the table, and the planner cannot roll back the dropping of the table.If the catalog implements this plugin, the catalog can implement the methods to "stage" the creation and the replacement of a table. After the table's
BatchWrite#commit(WriterCommitMessage[])is called,StagedTable#commitStagedChanges()is called, at which point the staged table can complete both the data write and the metadata swap operation atomically.- Since
3.0.0
-
trait
SupportsCatalogOptions extends TableProvider
An interface, which TableProviders can implement, to support table existence checks and creation through a catalog, without having to use table identifiers.
An interface, which TableProviders can implement, to support table existence checks and creation through a catalog, without having to use table identifiers. For example, when file based data sources use the
DataFrameWriter.save(path)method, the optionpathcan translate to a PathIdentifier. A catalog can then use this PathIdentifier to check the existence of a table, or whether a table can be created at a given directory.- Since
3.0.0
-
trait
SupportsDelete extends AnyRef
A mix-in interface for
Tabledelete support.A mix-in interface for
Tabledelete support. Data sources can implement this interface to provide the ability to delete data from tables that matches filter expressions.- Since
3.0.0
-
trait
SupportsNamespaces extends CatalogPlugin
Catalog methods for working with namespaces.
Catalog methods for working with namespaces.
If an object such as a table, view, or function exists, its parent namespaces must also exist and must be returned by the discovery methods
#listNamespaces()and#listNamespaces(String[]).Catalog implementations are not required to maintain the existence of namespaces independent of objects in a namespace. For example, a function catalog that loads functions using reflection and uses Java packages as namespaces is not required to support the methods to create, alter, or drop a namespace. Implementations are allowed to discover the existence of objects or namespaces without throwing
NoSuchNamespaceExceptionwhen no namespace is found.- Since
3.0.0
-
trait
SupportsRead extends Table
A mix-in interface of
Table, to indicate that it's readable.A mix-in interface of
Table, to indicate that it's readable. This adds#newScanBuilder(CaseInsensitiveStringMap)that is used to create a scan for batch, micro-batch, or continuous processing.- Since
3.0.0
-
trait
SupportsWrite extends Table
A mix-in interface of
Table, to indicate that it's writable.A mix-in interface of
Table, to indicate that it's writable. This adds#newWriteBuilder(LogicalWriteInfo)that is used to create a write for batch or streaming.- Since
3.0.0
-
trait
Table extends AnyRef
An interface representing a logical structured data set of a data source.
An interface representing a logical structured data set of a data source. For example, the implementation can be a directory on the file system, a topic of Kafka, or a table in the catalog, etc.
This interface can mixin
SupportsReadandSupportsWriteto provide data reading and writing ability.The default implementation of
#partitioning()returns an empty array of partitions, and the default implementation of#properties()returns an empty map. These should be overridden by implementations that support partitioning and table properties.- Since
3.0.0
-
sealed abstract final
class
TableCapability extends Enum[TableCapability]
Capabilities that can be provided by a
Tableimplementation.Capabilities that can be provided by a
Tableimplementation.Tables use
Table#capabilities()to return a set of capabilities. Each capability signals to Spark that the table supports a feature identified by the capability. For example, returning#BATCH_READallows Spark to read from the table using a batch scan.- Since
3.0.0
-
trait
TableCatalog extends CatalogPlugin
Catalog methods for working with Tables.
Catalog methods for working with Tables.
TableCatalog implementations may be case sensitive or case insensitive. Spark will pass
table identifierswithout modification. Field names passed toTableChange...)will be normalized to match the case used in the table schema when updating, renaming, or dropping existing columns when catalyst analysis is case insensitive.- Since
3.0.0
-
trait
TableChange extends AnyRef
TableChange subclasses represent requested changes to a table.
TableChange subclasses represent requested changes to a table. These are passed to
TableCatalog#alterTable. For example,import TableChange._ val catalog = Catalogs.load(name) catalog.asTableCatalog.alterTable(ident, addColumn("x", IntegerType), renameColumn("a", "b"), deleteColumn("c") )- Since
3.0.0
-
trait
TableProvider extends AnyRef
The base interface for v2 data sources which don't have a real catalog.
The base interface for v2 data sources which don't have a real catalog. Implementations must have a public, 0-arg constructor.
Note that, TableProvider can only apply data operations to existing tables, like read, append, delete, and overwrite. It does not support the operations that require metadata changes, like create/drop tables.
The major responsibility of this interface is to return a
Tablefor read/write.- Since
3.0.0