PUSH BIGDATA
Overview
The Push BigData operation allows you to send changes previously captured to either a CosmosDB or Google BigQuery endpoint. Since CosmosDB is a document database storing JSON documents, you can optionally construct the JSON Document as needed; however, if the PAYLOAD option is not provided, a default (flat) JSON document will be constructed.
Syntax
Sends data for upserted records to a big data endpoint (CosmosDB, BigQuery).
PUSH INTO <COSMOSDB | BIGQUERY> [CONNECTION]
-- Common options
{ AUTO_CREATE }
{ FIELDS '...' }
-- CosmosDB options
{ ACCOUNT '...' }
{ DATABASE '...' }
{ TABLE '...' }
{ PARTITIONKEY '...' }
{ PAYLOAD '...' }
-- BigQuery options
{ PROJECT '...' }
{ DATASET '...' }
{ CONTAINER '...' }
{ WRITE_BEHAVIOR < 'append' | 'truncate' | 'writeifempty' > }
{ WITH
{ TIMEOUT N }
{ BATCH N }
{ DISCARD_ON_SUCCESS }
{ DISCARD_ON_ERROR }
{ DLQ_ON_ERROR '...' }
{ RETRY < LINEAR | EXPONENTIAL > (N,N) }
}
;AUTO_CREATE |
Automatically creates the target databases/tables/containers/datasets if not found |
FIELDS |
Comma-separated list of fields from the pipeline to send to the target; leave blank to push all available fields in the pipeline data |
ACCOUNT |
The CosmosDB account name (overrides the connection) |
DATABASE |
The CosmosDB database name |
TABLE |
The CosmosDB table name |
PARTITIONKEY |
The document property to use as the partition key for CosmosDB; if not provided, /id will be used by default |
PAYLOAD |
A custom payload when building complex JSON documents for CosmosDB |
PROJECT |
The Google BigQuery project name (overrides the connection) |
DATASET |
The Google BigQuery dataset name |
CONTAINER |
The Google BigQuery container name |
WRITE_BEHAVIOR |
The Google BigQuery write behavior (append,truncate,writeifempty) |
TIMEOUT |
A timeout value in seconds |
BATCH |
The number of records to process in a single call |
RETRY_INTERVAL |
The retry interval in seconds (default: 1) |
RETRY_COUNT |
The maximum number of retries to perform (default: 1) |
WITH_PAYLOAD |
When tracing is enabled, indicates that the response payload should also be logged |
DISCARD_ON_SUCCESS |
Deletes the change log after successful completion of the push operation |
DISCARD_ON_ERROR |
Deletes the change log if the push operation failed |
DLQ_ON_ERROR |
Moves the change log to a subfolder or directory if the push operation failed |
RETRY EXPONENTIAL |
Retries the operation on failure up to N times, waiting P seconds exponentially longer every time (N,P) |
RETRY LINEAR |
Retries the operation on failure up to N times, waiting P seconds every time (N,P) |
Example 1
-- Load the next inserts/updates previously captured in a cloud folder using -- CAPTURE 'mycdc' INSERT UPDATE DELETE ON KEYS 'guid' WITH PATH [adls2.0] 'container' LOAD UPSERTS FROM [adls2.0] 'mycdc' PATH '/container' KEY_COLUMNS 'guid'; -- Send all updates and inserts into a target CosmosDB table -- and create the table if it doesn't exist; build a default, flat JSON -- document using all available fields PUSH INTO COSMOSDB [cosmosDBConnection] DATABASE 'testdb' TABLE 'rssdata' PARTITIONKEY '/guid' AUTO_CREATE WITH BATCH 1000 ;
Example 2
-- Load the next inserts/updates previously captured in a cloud folder using -- CAPTURE 'mycdc' INSERT UPDATE DELETE ON KEYS 'guid' WITH PATH [adls2.0] 'container' LOAD UPSERTS FROM [adls2.0] 'mycdc' PATH '/container' KEY_COLUMNS 'guid'; -- Send all updates and inserts into a target Big Query table -- and create the table if it doesn't exist; send only the fields selected -- document using all available fields PUSH INTO BIGQUERY [bigQueryConnection] DATASET 'testdb' CONTAINER 'rssdata' WRITE_BEHAVIOR 'APPEND' FIELDS 'guid,name,url' AUTO_CREATE WITH BATCH 1000 ;