Class TableLoadConfig


  • public class TableLoadConfig
    extends Object
    BQ table load configuration. Serves both for local file loading and remote(cloud storage) file loading into BQ. Differences: 1) Source URIs - for remote load use "gs://..." paths only. For local load use full local file system paths only. The URIs format determines whether this is a local load or a remote one. 2) Wildcards - Both expect wildcards in the filename part of the file path only. Both support '*' wildcard, but only local load supports multiple occurrences of '*' and also supports '?'. 3) Parallelism - Note that local load is serial and isn't as efficient as remote load, so for large data prefer using remote load. 4) Compression - remote load supports only gzip. Local load also supports zstd. Files must have the proper suffix to be identified correctly. Immutable.
    Author:
    Eyal Schneider
    • Method Detail

      • newBuilder

        public static TableLoadConfig.Builder newBuilder​(Set<String> sourceURIs,
                                                         com.google.cloud.bigquery.TableId destinationTableReference)
        Parameters:
        sourceURIs - the full paths to the source data. Each URI should be fully qualified, and may contain one wildcard ('*') in the file name part of the path. When loading local files, all URIs should have the form of a full local file system path. For remote loading they should all be valid cloud storage paths.
        destinationTableReference - The details of the table to load data into. In case of a partitioned table, use setPartition(..) to set the partition to load into.
        Returns:
        A builder initialized with the given sources and target, having default values in the other fields.
      • newBuilder

        public static TableLoadConfig.Builder newBuilder​(String sourceURI,
                                                         com.google.cloud.bigquery.TableId destinationTableReference)
        Parameters:
        sourceURI - the full path to the source file. The URI should be fully qualified, and may contain one wildcard ('*') in the file name part of the path. When loading a local file, the URI should have the form of a full local file system path. For remote loading it should be a valid cloud storage paths.
        destinationTableReference - The details of the table to load data into
        Returns:
        A builder initialized with the given source and target, having default values in the other fields.
      • getSourceURIs

        public Set<String> getSourceURIs()
        Returns:
        the full paths to the source data. Each URI should be fully qualified, and may contain one wildcard ('*') in the file name part of the path. When loading local files, all URIs should have the form of a full local file system path. For remote loading they should all be valid cloud storage paths.
      • isRemoteLoad

        public boolean isRemoteLoad()
        Returns:
        true if and only if this configuration is for a remote (cloud storage) file loading into BQ. This is determined by inspecting the form of the source URIs.
      • getDestinationTableReference

        public com.google.cloud.bigquery.TableId getDestinationTableReference()
        Returns:
        the destination table reference
      • getDestinationTablePartition

        public LocalDate getDestinationTablePartition()
        Returns:
        The destination table partition to write to, as a date. Should be specified only for partitioned tables.
      • getCsvFieldDelimiter

        public String getCsvFieldDelimiter()
        Returns:
        the input file field delimiter. Applies to CSV format only. Default is ",".
      • getCSVHasHeader

        public boolean getCSVHasHeader()
        Relevant for CSV format only.
        Returns:
        True if and only if the csv file has a header line to skip. Default is true.
      • getTableSchema

        public com.google.cloud.bigquery.Schema getTableSchema()
        Returns:
        the destination table schema. The schema can be omitted if the destination table already exists. If specified, it can serve for adding columns dynamically.
      • getDestinationTableExpirationHs

        public Integer getDestinationTableExpirationHs()
        Returns:
        the destination table expiration in hours. Null means no expiration.
      • getCreateDisposition

        public com.google.cloud.bigquery.JobInfo.CreateDisposition getCreateDisposition()
        Returns:
        The table creation mode. Defines how the command deals with a situation where the table to load into already exists. Default is CREATE_IF_NEEDED.
      • getWriteDisposition

        public com.google.cloud.bigquery.JobInfo.WriteDisposition getWriteDisposition()
        Returns:
        The mode defining how the command deals with existing rows in the target table. Default is WRITE_APPEND.
      • getAllowJaggedRows

        public boolean getAllowJaggedRows()
        Returns:
        true if and only if jagged rows are allowed. Jagged rows are rows that are missing optional columns (trailing columns only). When true, the missing values are treated as nulls. When false, missing values are considered an error. Default is false.
      • getClusteringFields

        public Set<String> getClusteringFields()
        Returns:
        The set of names of table fields defined as clustering fields. Mandatory when the table has clustering fields.
      • getTimeoutMs

        public Long getTimeoutMs()
        Returns:
        The load execution timeout, in milliseconds. Null means no timeout (default value). NOTE: Google's API doesn't seem to always respect this limit, and it's not always clear which timeout applies (The export level timeout here or the global one as provided in the BigQueryConnector's constructor.