Packages

package catalog

Type Members

  1. trait CatalogExtension extends TableCatalog with SupportsNamespaces

    An API to extend the Spark built-in session catalog.

    An API to extend the Spark built-in session catalog. Implementation can get the built-in session catalog from #setDelegateCatalog(CatalogPlugin), implement catalog functions with some custom logic and call the built-in session catalog at the end. For example, they can implement createTable, do something else before calling createTable of the built-in session catalog.

    Since

    3.0.0

  2. class CatalogNotFoundException extends SparkException
    Annotations
    @Experimental()
  3. trait CatalogPlugin extends AnyRef

    A marker interface to provide a catalog implementation for Spark.

    A marker interface to provide a catalog implementation for Spark.

    Implementations can provide catalog functions by implementing additional interfaces for tables, views, and functions.

    Catalog implementations must implement this marker interface to be loaded by SQLConf). The loader will instantiate catalog classes using the required public no-arg constructor. After creating an instance, it will be configured by calling CaseInsensitiveStringMap).

    Catalog implementations are registered to a name by adding a configuration option to Spark: spark.sql.catalog.catalog-name=com.example.YourCatalogClass. All configuration properties in the Spark configuration that share the catalog name prefix, spark.sql.catalog.catalog-name.(key)=(value) will be passed in the case insensitive string map of options in initialization with the prefix removed. name, is also passed and is the catalog's name; in this case, "catalog-name".

    Since

    3.0.0

  4. abstract class DelegatingCatalogExtension extends CatalogExtension

    A simple implementation of CatalogExtension, which implements all the catalog functions by calling the built-in session catalog directly.

    A simple implementation of CatalogExtension, which implements all the catalog functions by calling the built-in session catalog directly. This is created for convenience, so that users only need to override some methods where they want to apply custom logic. For example, they can override createTable, do something else before calling super.createTable.

    Since

    3.0.0

  5. trait Identifier extends AnyRef

    Identifies an object in a catalog.

    Identifies an object in a catalog.

    Since

    3.0.0

  6. trait NamespaceChange extends AnyRef

    NamespaceChange subclasses represent requested changes to a namespace.

    NamespaceChange subclasses represent requested changes to a namespace. These are passed to SupportsNamespaces#alterNamespace. For example,

      import NamespaceChange._
      val catalog = Catalogs.load(name)
      catalog.alterNamespace(ident,
          setProperty("prop", "value"),
          removeProperty("other_prop")
        )
    

    Since

    3.0.0

  7. trait SessionConfigSupport extends TableProvider

    A mix-in interface for TableProvider.

    A mix-in interface for TableProvider. Data sources can implement this interface to propagate session configs with the specified key-prefix to all data source operations in this session.

    Since

    3.0.0

  8. trait StagedTable extends Table

    Represents a table which is staged for being committed to the metastore.

    Represents a table which is staged for being committed to the metastore.

    This is used to implement atomic CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT queries. The planner will create one of these via StructType, Transform[], Map) or StructType, Transform[], Map) to prepare the table for being written to. This table should usually implement SupportsWrite. A new writer will be constructed via SupportsWrite#newWriteBuilder(LogicalWriteInfo), and the write will be committed. The job concludes with a call to #commitStagedChanges(), at which point implementations are expected to commit the table's metadata into the metastore along with the data that was written by the writes from the write builder this table created.

    Since

    3.0.0

  9. trait StagingTableCatalog extends TableCatalog

    An optional mix-in for implementations of TableCatalog that support staging creation of the a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.

    An optional mix-in for implementations of TableCatalog that support staging creation of the a table before committing the table's metadata along with its contents in CREATE TABLE AS SELECT or REPLACE TABLE AS SELECT operations.

    It is highly recommended to implement this trait whenever possible so that CREATE TABLE AS SELECT and REPLACE TABLE AS SELECT operations are atomic. For example, when one runs a REPLACE TABLE AS SELECT operation, if the catalog does not implement this trait, the planner will first drop the table via TableCatalog#dropTable(Identifier), then create the table via StructType, Transform[], Map), and then perform the write via SupportsWrite#newWriteBuilder(LogicalWriteInfo). However, if the write operation fails, the catalog will have already dropped the table, and the planner cannot roll back the dropping of the table.

    If the catalog implements this plugin, the catalog can implement the methods to "stage" the creation and the replacement of a table. After the table's BatchWrite#commit(WriterCommitMessage[]) is called, StagedTable#commitStagedChanges() is called, at which point the staged table can complete both the data write and the metadata swap operation atomically.

    Since

    3.0.0

  10. trait SupportsCatalogOptions extends TableProvider

    An interface, which TableProviders can implement, to support table existence checks and creation through a catalog, without having to use table identifiers.

    An interface, which TableProviders can implement, to support table existence checks and creation through a catalog, without having to use table identifiers. For example, when file based data sources use the DataFrameWriter.save(path) method, the option path can translate to a PathIdentifier. A catalog can then use this PathIdentifier to check the existence of a table, or whether a table can be created at a given directory.

    Since

    3.0.0

  11. trait SupportsDelete extends AnyRef

    A mix-in interface for Table delete support.

    A mix-in interface for Table delete support. Data sources can implement this interface to provide the ability to delete data from tables that matches filter expressions.

    Since

    3.0.0

  12. trait SupportsNamespaces extends CatalogPlugin

    Catalog methods for working with namespaces.

    Catalog methods for working with namespaces.

    If an object such as a table, view, or function exists, its parent namespaces must also exist and must be returned by the discovery methods #listNamespaces() and #listNamespaces(String[]).

    Catalog implementations are not required to maintain the existence of namespaces independent of objects in a namespace. For example, a function catalog that loads functions using reflection and uses Java packages as namespaces is not required to support the methods to create, alter, or drop a namespace. Implementations are allowed to discover the existence of objects or namespaces without throwing NoSuchNamespaceException when no namespace is found.

    Since

    3.0.0

  13. trait SupportsRead extends Table

    A mix-in interface of Table, to indicate that it's readable.

    A mix-in interface of Table, to indicate that it's readable. This adds #newScanBuilder(CaseInsensitiveStringMap) that is used to create a scan for batch, micro-batch, or continuous processing.

    Since

    3.0.0

  14. trait SupportsWrite extends Table

    A mix-in interface of Table, to indicate that it's writable.

    A mix-in interface of Table, to indicate that it's writable. This adds #newWriteBuilder(LogicalWriteInfo) that is used to create a write for batch or streaming.

    Since

    3.0.0

  15. trait Table extends AnyRef

    An interface representing a logical structured data set of a data source.

    An interface representing a logical structured data set of a data source. For example, the implementation can be a directory on the file system, a topic of Kafka, or a table in the catalog, etc.

    This interface can mixin SupportsRead and SupportsWrite to provide data reading and writing ability.

    The default implementation of #partitioning() returns an empty array of partitions, and the default implementation of #properties() returns an empty map. These should be overridden by implementations that support partitioning and table properties.

    Since

    3.0.0

  16. sealed abstract final class TableCapability extends Enum[TableCapability]

    Capabilities that can be provided by a Table implementation.

    Capabilities that can be provided by a Table implementation.

    Tables use Table#capabilities() to return a set of capabilities. Each capability signals to Spark that the table supports a feature identified by the capability. For example, returning #BATCH_READ allows Spark to read from the table using a batch scan.

    Since

    3.0.0

  17. trait TableCatalog extends CatalogPlugin

    Catalog methods for working with Tables.

    Catalog methods for working with Tables.

    TableCatalog implementations may be case sensitive or case insensitive. Spark will pass table identifiers without modification. Field names passed to TableChange...) will be normalized to match the case used in the table schema when updating, renaming, or dropping existing columns when catalyst analysis is case insensitive.

    Since

    3.0.0

  18. trait TableChange extends AnyRef

    TableChange subclasses represent requested changes to a table.

    TableChange subclasses represent requested changes to a table. These are passed to TableCatalog#alterTable. For example,

      import TableChange._
      val catalog = Catalogs.load(name)
      catalog.asTableCatalog.alterTable(ident,
          addColumn("x", IntegerType),
          renameColumn("a", "b"),
          deleteColumn("c")
        )
    

    Since

    3.0.0

  19. trait TableProvider extends AnyRef

    The base interface for v2 data sources which don't have a real catalog.

    The base interface for v2 data sources which don't have a real catalog. Implementations must have a public, 0-arg constructor.

    Note that, TableProvider can only apply data operations to existing tables, like read, append, delete, and overwrite. It does not support the operations that require metadata changes, like create/drop tables.

    The major responsibility of this interface is to return a Table for read/write.

    Since

    3.0.0

Ungrouped