Package org.pipecraft.infra.bq
Class BigQueryConnector
- java.lang.Object
-
- org.pipecraft.infra.bq.BigQueryConnector
-
- All Implemented Interfaces:
org.pipecraft.infra.monitoring.JsonMonitorable
public class BigQueryConnector extends Object implements org.pipecraft.infra.monitoring.JsonMonitorable
Used for interacting with Google's BigQuery. Supports: 1) Initialization with environment credentials (no option to set credentials explicitly yet) 2) Select and DML queries (throughBQQueryandBQDMLQuery) 3) Async query executions 4) Writing query results to specific BQ tables 5) Exporting results to google storage, asynchronously 6) Load local/google-storage files into BQ tables, asynchronously 7) Monitoring of all operations 8) Allows limiting the number of concurrent BQ operations, using constructor parameters- Author:
- Eyal Schneider
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBigQueryConnector.BQExportFuturestatic classBigQueryConnector.BQQueryResultFuture<R,F>static classBigQueryConnector.BQTableLoadFuture
-
Constructor Summary
Constructors Constructor Description BigQueryConnector(String projectId, long connTimeoutMs, long readTimeoutMs, QueryExecutionConfig defaultExecutionConfig, Consumer<BQQueryExecutionSummary> observer, ExecutorService ex)Constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description <R,F>
BQResultsIterator<R,F>execute(BQQuery<R,F> query)Runs a query synchronously.<R,F>
BQResultsIterator<R,F>execute(BQQuery<R,F> query, QueryExecutionConfig config)Runs a query synchronously.<R,F>
BigQueryConnector.BQQueryResultFuture<R,F>executeAsync(BQQuery<R,F> query)Runs a query asynchronously, returning a future, which is both checked and listenable.<R,F>
BigQueryConnector.BQQueryResultFuture<R,F>executeAsync(BQQuery<R,F> query, QueryExecutionConfig config)Runs a query asynchronously, returning a future, which is both checked and listenable.voidexecuteNoStreaming(BQQuery<?,?> query)Runs a query synchronously without streaming results back.longexecuteNoStreaming(BQQuery<?,?> query, QueryExecutionConfig config)Runs a query synchronously without streaming results back.org.pipecraft.infra.concurrent.CheckedFuture<Void,BQException>executeNoStreamingAsync(BQQuery<?,?> query)Runs a query asynchronously without streaming results back.<R,F>
BigQueryConnector.BQQueryResultFuture<R,F>executeNoStreamingAsync(BQQuery<R,F> query, QueryExecutionConfig config)Runs a query asynchronously without streaming results back.voidexportTable(TableExportConfig config)Runs an export job, synchronouslyBigQueryConnector.BQExportFutureexportTableAsync(TableExportConfig config)Runs an export job, asynchronouslyMap<String,? extends org.pipecraft.infra.monitoring.JsonMonitorable>getChildren()QueryExecutionConfiggetDefaultQueryExecutionConfig()ExecutorServicegetExecutorService()net.minidev.json.JSONObjectgetOwnMetrics()StringgetProjectId()voidloadTable(TableLoadConfig tableLoadConfig)Runs a synchronous table loadBigQueryConnector.BQTableLoadFutureloadTableAsync(TableLoadConfig tableLoadConfig)Runs an async table load jobbooleantableExists(String dataset, String table)voidupdateTableExpiration(String datasetId, String tableName, Integer duration, TimeUnit timeUnit)Sets table's expiration time.
-
-
-
Constructor Detail
-
BigQueryConnector
public BigQueryConnector(String projectId, long connTimeoutMs, long readTimeoutMs, QueryExecutionConfig defaultExecutionConfig, Consumer<BQQueryExecutionSummary> observer, ExecutorService ex) throws IOException
Constructor- Parameters:
projectId- The project that this instance is bound to. All actions will be performed in the scope of this project.connTimeoutMs- Timeout (in milliseconds) for connection establishmentreadTimeoutMs- Socket read timeout (in milliseconds) of all requests. This low-level timeout defines how long a blocking read on the socket should wait for data. NOTE: Google's API doesn't seem to always respect this limit, and it's not always clear which timeout applies (The query level timeout here or the global one as provided in theBigQueryConnector's constructor).defaultExecutionConfig- The default query execution configuration, in case none is specified when calling the execution methods.observer- on the BQ query execution, can be nullex- The executor to run BQ requests on. Can be multi-threaded or direct executor, depending on the required threading policy. It's the responsibility of the caller to shut down this executor.- Throws:
IOException- In case that the connector can't be initialized
-
-
Method Detail
-
getProjectId
public String getProjectId()
- Returns:
- The project that this instance is bound to. All actions will be performed in the scope of this project.
-
getDefaultQueryExecutionConfig
public QueryExecutionConfig getDefaultQueryExecutionConfig()
- Returns:
- the default query execution configuration. Use toBuilder().setXXX().setYYY().build() to create a copy with a few settings changed.
-
getExecutorService
public ExecutorService getExecutorService()
- Returns:
- The executor provided in the constructor (and owner by the caller)
-
executeAsync
public <R,F> BigQueryConnector.BQQueryResultFuture<R,F> executeAsync(BQQuery<R,F> query, QueryExecutionConfig config)
Runs a query asynchronously, returning a future, which is both checked and listenable. Note that once the future terminates successfully and provides its value, it's still not final, in the sense that resultset iteration may still produce errors, since server page requests are used during iteration. The caller may set a destination table reference and table expiration time in the supplied query config object.- Parameters:
query- The query to executeconfig- The execution configuration- Returns:
- The future providing the query result or the query execution exception.
This future is both checked and listenable (see
CheckedFutureandListenableFuture). Upon success, an iterator on result rows is provided by the future. Note that the iterator's next() method may throw aQueryResultBrokenExceptionin case that the connection with BQ is broken during result set streaming. In Case of a DML(Data Manipulation Language) query, the future returns an empty iterator.
-
execute
public <R,F> BQResultsIterator<R,F> execute(BQQuery<R,F> query, QueryExecutionConfig config) throws InterruptedException, BQException
Runs a query synchronously. Note that once the call returns successfully and provides its value, it's still not final, in the sense that resultset iteration may still produce errors, since server page requests are used during iteration. The caller may set a destination table reference and table expiration time in the supplied query config object.- Parameters:
query- The query to executeconfig- The execution configuration- Returns:
- The results iterator
- Throws:
BQExceptionInterruptedException
-
executeNoStreamingAsync
public <R,F> BigQueryConnector.BQQueryResultFuture<R,F> executeNoStreamingAsync(BQQuery<R,F> query, QueryExecutionConfig config)
Runs a query asynchronously without streaming results back. Recommended for use only for DML queries or for queries which dump results to a table anyway.- Parameters:
query- The query to executeconfig- The execution configuration- Returns:
- The future to use for determining completion and completion type (successful/failed).
-
executeNoStreaming
public long executeNoStreaming(BQQuery<?,?> query, QueryExecutionConfig config) throws InterruptedException, BQException
Runs a query synchronously without streaming results back. Recommended for use only for DML queries or for queries which dump results to a table anyway.- Parameters:
query- The query to executeconfig- The execution configuration- Returns:
- the record count
- Throws:
BQExceptionInterruptedException
-
executeAsync
public <R,F> BigQueryConnector.BQQueryResultFuture<R,F> executeAsync(BQQuery<R,F> query)
Runs a query asynchronously, returning a future, which is both checked and listenable. Note that once the future terminates successfully and provides its value, it's still not final, in the sense that resultset iteration may still produce errors, since server page requests are used during iteration. Uses the default query execution config as defined in the constructor- Parameters:
query- The query to execute- Returns:
- The future providing the query result or the query execution exception.
Upon success, an iterator on result rows is provided by the future. Note that the iterator's next() method may throw a
QueryResultBrokenExceptionin case that the connection with BQ is broken during result set streaming. In Case of a DML(Data Manipulation Language) query, the future returns an empty iterator.
-
execute
public <R,F> BQResultsIterator<R,F> execute(BQQuery<R,F> query) throws InterruptedException, BQException
Runs a query synchronously. Note that once the call returns successfully and provides its value, it's still not final, in the sense that resultset iteration may still produce errors, since server page requests are used during iteration. Uses the default query execution config as defined in the constructor- Parameters:
query- The query to execute- Returns:
- The results iterator
- Throws:
BQExceptionInterruptedException
-
executeNoStreamingAsync
public org.pipecraft.infra.concurrent.CheckedFuture<Void,BQException> executeNoStreamingAsync(BQQuery<?,?> query)
Runs a query asynchronously without streaming results back. Recommended for use only for DML queries or for queries which dump results to a table anyway.- Parameters:
query- The query to execute- Returns:
- The future to use for determining completion and completion type (successful/failed).
-
executeNoStreaming
public void executeNoStreaming(BQQuery<?,?> query) throws BQException, InterruptedException
Runs a query synchronously without streaming results back. Recommended for use only for DML queries or for queries which dump results to a table anyway.- Parameters:
query- The query to execute- Throws:
BQExceptionInterruptedException
-
tableExists
public boolean tableExists(String dataset, String table)
- Parameters:
dataset- A dataset nametable- A table name- Returns:
- true iff the table exists
-
updateTableExpiration
public void updateTableExpiration(String datasetId, String tableName, Integer duration, TimeUnit timeUnit)
Sets table's expiration time. Table must exist.- Parameters:
datasetId- the table's dataset idtableName- the table's nameduration- number of units for table's deletion, measured from now. Must be greater than 0. Null means infinite.timeUnit- the time unit of the given duration.
-
getOwnMetrics
public net.minidev.json.JSONObject getOwnMetrics()
- Specified by:
getOwnMetricsin interfaceorg.pipecraft.infra.monitoring.JsonMonitorable
-
getChildren
public Map<String,? extends org.pipecraft.infra.monitoring.JsonMonitorable> getChildren()
- Specified by:
getChildrenin interfaceorg.pipecraft.infra.monitoring.JsonMonitorable
-
exportTableAsync
public BigQueryConnector.BQExportFuture exportTableAsync(TableExportConfig config)
Runs an export job, asynchronously- Parameters:
config- The export configuration- Returns:
- The export future, which is checked and listenable
-
exportTable
public void exportTable(TableExportConfig config) throws InterruptedException, BQException
Runs an export job, synchronously- Parameters:
config- The export configuration- Throws:
BQExceptionInterruptedException
-
loadTableAsync
public BigQueryConnector.BQTableLoadFuture loadTableAsync(TableLoadConfig tableLoadConfig)
Runs an async table load job- Parameters:
tableLoadConfig- the load job configuration. Serves for local or remote load from cloud storage.- Returns:
- The future for this async operation. This future is both checked and listenable.
-
loadTable
public void loadTable(TableLoadConfig tableLoadConfig) throws InterruptedException, BQException
Runs a synchronous table load- Parameters:
tableLoadConfig- the load job configuration. Serves for local or remote load from cloud storage.- Throws:
BQExceptionInterruptedException
-
-