FirestoreのデータをStorageを経由してBigQueryへエクスポートする

FIrestoreからStorageにエクスポートする

const functions = require('firebase-functions');
const firestore = require('@google-cloud/firestore');
const client = new firestore.v1.FirestoreAdminClient();

// Replace BUCKET_NAME
const bucket = 'gs://****';

exports.scheduledFirestoreExport = functions.pubsub
                                            .schedule('every 24 hours')
                                            .onRun((context) => {

  const projectId = process.env.GCP_PROJECT || process.env.GCLOUD_PROJECT;
  const databaseName =
    client.databasePath(projectId, '(default)');

  return client.exportDocuments({
    name: databaseName,
    outputUriPrefix: bucket,
    // Leave collectionIds empty to export all collections
    // or set to a list of collection IDs to export,
    collectionIds: [{collection id1},{sub collection id of id1},{sub collection id of id1},{collection id2}']//works fine
    })
  .then(responses => {
    const response = responses[0];
    console.log(`Operation Name: ${response['name']}`);
  })
  .catch(err => {
    console.error(err);
    throw new Error('Export operation failed');
  });
});

collectionIdsの指定でかなり迷ったが，最も上の階層にあるコレクションIDとその下の階層にあるコレクションIDを同列で書くらしい．上の階層のコレクションIDを指定せずに下の階層のIDだけ指定すると何も出力されない collectionIdsを指定しなくてもエクスポートはできるものの，その場合，StorageからBigQueryに読み込めなくなるので今回は指定することが必須． https://cloud.google.com/bigquery/docs/loading-data-cloud-firestore?hl=ja

StorageからBigQueryへの読み込み

aaa


import os
from google.cloud import bigquery
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = './credential.json'
# TODO(developer): Set table_id to the ID of the table to create.

table_id = "[project id].[db of bq].[table id]"

# TODO(developer): Set uri to the path of the kind export metadata
uri = (
    "gs://****-backup-all/2021-09-09T11:50:09_53192/all_namespaces/kind_***/all_namespaces_kind_****.export_metadata"
)

# TODO(developer): Set projection_fields to a list of document properties
#                  to import. Leave unset or set to `None` for all fields.
projection_fields = []

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

job_config = bigquery.LoadJobConfig(
    source_format=bigquery.SourceFormat.DATASTORE_BACKUP,
    projection_fields=projection_fields,
    write_disposition='WRITE_TRUNCATE'
)
'''
WRITE_TRUNCATE: If the table already exists, BigQuery overwrites the table data and uses the schema from the load.
WRITE_APPEND: If the table already exists, BigQuery appends the data to the table.
WRITE_EMPTY: If the table already exists and contains data, a 'duplicate' error is returned in the job result.
'''

load_job = client.load_table_from_uri(
    uri, table_id, job_config=job_config
)  # Make an API request.

load_job.result()  # Waits for the job to complete.

destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

a u 上書きするかどうかはjob_config = bigquery.LoadJobConfigの部分でwrite_disposition引数で指定する． WRITE_TRUNCATE: If the table already exists, BigQuery overwrites the table data and uses the schema from the load. WRITE_APPEND: If the table already exists, BigQuery appends the data to the table. WRITE_EMPTY: If the table already exists and contains data, a ‘duplicate’ error is returned in the job result. projection_fieldsでエクスポートするFIeldを指定できる． a ‘ ‘