PUSH BIGDATA

Overview

The Push BigData operation allows you to send changes previously captured to either a CosmosDB or Google BigQuery endpoint. Since CosmosDB is a document database storing JSON documents, you can optionally construct the JSON Document as needed; however, if the PAYLOAD option is not provided, a default (flat) JSON document will be constructed.

Syntax

Sends data for upserted records to a big data endpoint (CosmosDB, BigQuery).

PUSH INTO <COSMOSDB | BIGQUERY> [CONNECTION]
        -- Common options
        { AUTO_CREATE }
        { FIELDS '...' }
        
        -- CosmosDB options
        { ACCOUNT '...' }
        { DATABASE '...' }
        { TABLE '...' }
        { PARTITIONKEY '...' }
        { PAYLOAD '...' }
    
        -- BigQuery options
        { PROJECT '...' }
        { DATASET '...' }
        { CONTAINER '...' }
        { WRITE_BEHAVIOR < 'append' | 'truncate' | 'writeifempty' > }
    
    { WITH
        { TIMEOUT N }
        { BATCH N }
        { DISCARD_ON_SUCCESS }
        { DISCARD_ON_ERROR }
        { DLQ_ON_ERROR '...' }
        { RETRY < LINEAR | EXPONENTIAL > (N,N) }
    }
;

AUTO_CREATE

Automatically creates the target databases/tables/containers/datasets if not found

FIELDS

Comma-separated list of fields from the pipeline to send to the target; leave blank to push all available fields in the pipeline data

ACCOUNT

The CosmosDB account name (overrides the connection)

DATABASE

The CosmosDB database name

TABLE

The CosmosDB table name

PARTITIONKEY

The document property to use as the partition key for CosmosDB; if not provided, /id will be used by default

PAYLOAD

A custom payload when building complex JSON documents for CosmosDB

PROJECT

The Google BigQuery project name (overrides the connection)

DATASET

The Google BigQuery dataset name

CONTAINER

The Google BigQuery container name

WRITE_BEHAVIOR

The Google BigQuery write behavior (append,truncate,writeifempty)

TIMEOUT

A timeout value in seconds

BATCH

The number of records to process in a single call

RETRY_INTERVAL

The retry interval in seconds (default: 1)

RETRY_COUNT

The maximum number of retries to perform (default: 1)

WITH_PAYLOAD

When tracing is enabled, indicates that the response payload should also be logged

DISCARD_ON_SUCCESS

Deletes the change log after successful completion of the push operation

DISCARD_ON_ERROR

Deletes the change log if the push operation failed

DLQ_ON_ERROR

Moves the change log to a subfolder or directory if the push operation failed

RETRY EXPONENTIAL

Retries the operation on failure up to N times, waiting P seconds exponentially longer every time (N,P)

RETRY LINEAR

Retries the operation on failure up to N times, waiting P seconds every time (N,P)

Example 1


-- Load the next inserts/updates previously captured in a cloud folder using
-- CAPTURE 'mycdc' INSERT UPDATE DELETE ON KEYS 'guid' WITH PATH [adls2.0] 'container'

LOAD UPSERTS FROM [adls2.0] 'mycdc' PATH '/container' KEY_COLUMNS 'guid';

-- Send all updates and inserts into a target CosmosDB table
-- and create the table if it doesn't exist; build a default, flat JSON 
-- document using all available fields

PUSH INTO COSMOSDB [cosmosDBConnection] 
	DATABASE 'testdb'
	TABLE 'rssdata'
	PARTITIONKEY '/guid'
	AUTO_CREATE
  WITH
	BATCH 1000
;

Example 2


-- Load the next inserts/updates previously captured in a cloud folder using
-- CAPTURE 'mycdc' INSERT UPDATE DELETE ON KEYS 'guid' WITH PATH [adls2.0] 'container'

LOAD UPSERTS FROM [adls2.0] 'mycdc' PATH '/container' KEY_COLUMNS 'guid';

-- Send all updates and inserts into a target Big Query table
-- and create the table if it doesn't exist; send only the fields selected 
-- document using all available fields

PUSH INTO BIGQUERY [bigQueryConnection] 
	DATASET 'testdb'
	CONTAINER 'rssdata'
	WRITE_BEHAVIOR 'APPEND'
	FIELDS 'guid,name,url'
	AUTO_CREATE
  WITH
	BATCH 1000
;