26,335 questions
0
votes
1
answer
140
views
Reorder columns in a connected data table
I have a google sheets connected sheet with data coming from bigquery. Can I reorder the columns in my spreadsheet? I can't seem to drag them around.
0
votes
3
answers
137
views
Long form record view in looker studio
I have a bigquery dataset containing leads (names, email addresses, a ton of info gleaned from different sources). I have a looker studio dashboard that shows rows from this datasets, sorted by a ...
0
votes
0
answers
69
views
Is hard-coding the source and destination table names in my Python data pipeline code a security risk?
I am building data pipelines (ETL) using Python and BigQuery. My repository is safely stored on a GitHub-like service and the pipeline will be built to a Docker container that is later run on a ...
0
votes
0
answers
33
views
Firebase Analytics losting event collect in Android App: 20% of sessions has only session_start events
I have a native Android App developed with Kotlin. It uses Firebase Analytics to send custom events to GA4 (360 account), but 20% of sessions has only the standard session_start event. In other ...
0
votes
1
answer
119
views
In BigQuery, check if a substring of one column is in another column (or across tables)
The problem is trying to match MCC values ("Merchant Category Codes") based not on the code but the business name, where the names are from two sources and thus not identical. The goal is to ...
0
votes
0
answers
63
views
Improve query performance with multiple between clauses
I have a table of data (data_table) and a second table with ranges (range_table). I need an answer to which rows in the data table are not contained in the ranges table.
I expect to have most ...
0
votes
0
answers
142
views
Python, bigquery set session_user
Good day all,
I am learing Python and am trying to connect to Google bigquery. In that I succeeded.
I have a view in BigQuery that uses the Session_User() functionality to limit the information that ...
0
votes
1
answer
58
views
Need help replacing poorly formatted string dates as properly formatted timestamps in BigQuery
I am working on the Google Data Analytics Certificate and trying to clean a dataset consisting of 3 columns in BigQuery:
An Id number
A date in MM/DD/YYYY HH:MM:SS AM/PM format
Number of calories
...
0
votes
2
answers
110
views
SQL recursive CTEs to perm array
I'm having some trouble formulating the correct recursive behaviour to get what I want in SQL.
I'm limited to the BigQuery environment and I want to avoid using any JavaScript if I can, so I wanted to ...
-1
votes
2
answers
95
views
GCP Batch Dataflow - Records Dropped while inserting to BigQuery
Im using GCP Batch Dataflow to process data that im picking from a table. The input here is table data - where im using a query in Java to get the data.
After processing, when I'm trying to insert the ...
2
votes
1
answer
159
views
Why are BQ snapshots using the full storage amount of the original table?
I was hoping for an explanation of some unexpected BQ snapshot behavior. Here's a minimal example to create a table and snapshot it twice:
-- Create a partitioned table and populate it with data
...
0
votes
1
answer
71
views
SQL Bigquery Complex Regexp
I am trying to write a SQL query where I check string values in a column and if the
column contains names that are in the name_list that have a hyphen before or after the target name, I want to flag ...
0
votes
1
answer
103
views
Calculate weekly retention each day
I'm looking to calculate NURR, CURR and RURR each day for some user data from an app. I've started with NURR (New User Retention Rate), which is defined as:
The percentage of users who first opened ...
0
votes
0
answers
52
views
Preserving null elements in array when loading Avro to BigQuery
I'm working with an Avro file that contains an array field where each element can be either null or long. The Avro schema for the field looks something like this:
{
"name": "myArray&...
1
vote
0
answers
117
views
Java 17 with Google Bigquery `google.cloud.bigquerystorage` library | Compilation fails because of conflicting package
I'm trying to use the google's library for BQ integration, I'm using maven to manage dependencies and Java 17 for compilation and runtime.
In my pom.xml I use their BOM for version management, two ...
0
votes
0
answers
45
views
Refresh Schema for external tables for Delta Lake
I have created an external table in BigQuery, which is based on the GCS path that holds data in Delta format. I updated the schema of that GCS data using the mergeSchema command. Now, I am attempting ...
0
votes
1
answer
26
views
404 Not found: Table airflow-project-446808:retail.raw_invoices was not found in location US;
I am trying to run dbt transforms in Airflow, my table raw_invoices is already created in US location. still I am getting below error in Airflow.
-- dim_customer.sql
--Create the dimension table
...
0
votes
1
answer
233
views
BigQuery - Clustering and null values
I am conducting tests with BigQuery on the clustering feature, and the results are strange. I can't understand them; they go against what I expected.
Here are the tests conducted:
I have two tables ...
0
votes
1
answer
54
views
In BigQuery does MERGE statement with search_condition have any benefits comparing to INSERT with filtering
I have a source table and I want to insert to destination table only the rows with the keys that are not exist there (it's a bit simplified description of what I want to achieve). I can do it like ...
1
vote
1
answer
95
views
How to Prevent BigQuery from Adding a Top-Level 'Root' Record and Auto-Prefixing Nested Fields in Avro Export?
I've been facing issues generating Avro files from a BigQuery dataset while trying to maintain a predefined schema. My goal is to export Avro files without any post-processing, ensuring the schema ...
0
votes
1
answer
60
views
working on 'New York Taxi Trip' Project in BigQuery GCP, issue converting 2 cols to datetime64[ns] format from timestamp[us, tz=UTC][pyarrow]
** working on the 'New York Taxi trip' Project in BigQuery in GCP.
** the data has 2 columns:- pickup time and dropoff time [lets consider df variable]
** df.info() states the format of the two ...
-1
votes
4
answers
148
views
How to round to 0.5 or 1?
I'm looking to round values like
Input
Desired Output
0.01
0.5
2.3913
2.5
4.6667
4.5
2.11
2.5
How can I manage this in BigQuery? I have a script below which gives me 2 instead of 2.5.
(round((2.11) * ...
0
votes
1
answer
48
views
Extracting JSON with nested `$` keys
I'm looking for a method to parse a JSON with nested keys containing the $ symbol.
with example as (
select '{"id1":{"$id2":"12345"},"something":{"$id3&...
1
vote
1
answer
336
views
How can I actually see the merge instruction for incremental models in dbt core?
According to dbt docs you just add some instruction like
{{
config(
materialized='incremental',
incremental_strategy='merge',
unique_key='date_day'
)
}}
...
{% if ...
0
votes
0
answers
111
views
Create a read session with BigQuery Emulator Testcontainer with Apache Beam
I am using the BigQuery emulator (https://github.com/goccy/bigquery-emulator) for my integration tests in local machine.
I have a problem on my job that cannot create a read session, the job stuck ...
0
votes
1
answer
85
views
Counting Number of Protest from GDELT database
Unfortunately I have no experience with BigQuery or programming in general, but I need data from GDELT for my thesis and can't access it through the analysis tool. That's why I created this query to ...
0
votes
1
answer
181
views
How to replace underscores with spaces in BigQuery column names?
I recently loaded a dataset into BigQuery, and all my column names use underscores (_) instead of spaces. For example:
car_type instead of Car Type
customer_name instead of Customer Name
I would like ...
0
votes
1
answer
64
views
BigQuery - Sorting a Datetime string
I have a column of datatype String. Eg: 2025-01-20T23:38:31.8223598Z
If I apply a ORDER BY on this column inside a window function as below:
ROW_NUMBER() OVER (PARTITION BY id ORDER BY modifiedOn DESC)...
0
votes
2
answers
140
views
Difference in query and error even though distinct values
It is Bigquery and below are the sample tables and contents
dde-demo-d001.sap_crm.transactions_bkup
case_guid
transaction_header_guid
005056935CD81EEF92DF522476D53CAB
00505693-5CD8-1EEF-92DF-...
-1
votes
1
answer
103
views
Match array column with comma separated value in Bigquery Table
I have two tables in BigQuery - let's say:
Table A
column 1 value a
column 2 comma separated Values (aa,ab,ac)
column 3
Table B
column 1 Array [a,b,c,d]
column 2 Array [aa,ad]
column 4
Now I want to ...
1
vote
0
answers
74
views
BigQuery Load Fails with INVALID_ARGUMENT: FLOAT Field Type Mismatch Due to Nulls in Parquet Data
I'm encountering an error when loading Parquet data into BigQuery. My table schema defines a field (e.g., voltage) as FLOAT and NULLABLE. However, when the Parquet file contains null values for this ...
0
votes
1
answer
96
views
Not getting the all the namespaces in big query billing export table
I have enabled the GCP cloud billing data to export into bigquery. For GKE I'm not getting all the workloads namespaces into gcp_billing_export table.
I have enabled the GKE Metering, It's created the ...
1
vote
1
answer
169
views
BigQuery: When to Use JSON vs. ARRAY<STRUCT<key, value>> for Storing Key-Value Data?
I’m working with key-value data in BigQuery and considering two possible ways to store it:
As a JSON column
As an ARRAY of STRUCTs, where each struct has a key (STRING) and value (STRING)
Is there a ...
0
votes
0
answers
54
views
How to query into project labels (not BQ labels)
We are using labels on GCP projects, and I am trying to figure out how to "unwrap" them/ query against them. I am finding almost no success as any search with BigQuery and labels returns ...
1
vote
0
answers
144
views
How to query Google Ads Tables on BigQuery
I need guidance on how to create a query in BigQuery using Google Ads data transferred via a Data Transfer. My issue is that there are two tables:
p_ads_CampaignBasicStats_, which contains ...
0
votes
2
answers
172
views
Beam/Dataflow pipeline writing to BigQuery fails to convert timestamps (sometimes)
I have a beam/dataflow pipeline that reads from Pub/Sub and writes to BiqQuery with WriteToBigQuery. I convert all timestamps to apache_beam.utils.timestamp.Timestamp. I am sure all timestamps are ...
0
votes
0
answers
72
views
Simba JDBC Null pointer exception when querying tables via BigQuery Databricks connection
I have a federated connection to BigQuery that has GA events tables for each of our projects. I'm trying to query each daily table which contains about 400,000 each day, and load into another table, ...
0
votes
2
answers
163
views
Using multiple CREATE TEMP FUNCTION statements runs fine, fails when executed
I'm trying to create some reusable functions within my SQLX file and I'm finding that if I use CREATE TEMP FUNCTION statements, that it fails when I try to execute the job.
Take this very basic ...
0
votes
0
answers
75
views
Add a descriptor for nested messages in Protobuf Python
I am using protobuf to send rows to Bigquery to populate tables.
One issue I am facing is with nested fields/messages.
Indeed, I only know how to add a descriptor for one message, but in the case when ...
-2
votes
1
answer
60
views
Extract part of string using BigQuery
I have a text field with a directory structure of which I'd like to extract either the 2nd or 3rd word from the field - depending on other criteria. The structure is separated by '/' character. ...
1
vote
1
answer
240
views
Dataproc PySpark Job Fails with BigQuery Connector Issue - java.util.ServiceConfigurationError
I'm trying to run a PySpark job on Google Cloud Dataproc that reads data from BigQuery, processes it, and writes it back. However, the job keeps failing with the following error:
java.util....
0
votes
1
answer
89
views
Bigquery transform dictionary to array
I have a Bigquery table with a 2 fields: uid (string) and cart (json), the json field has random keys (the name of the keys are not predictable).
Example of a row:
uid | ...
0
votes
0
answers
89
views
How can I properly stream results in Big Query?
I am trying to use the following code to stream a query that returns 250 million results from a big query table:
async* streamData(): AsyncGenerator<DataRecord> {
const query = `
SELECT
...
0
votes
2
answers
209
views
Bulk decompressing files on GCS
I have a shell script that processes compressed (gzipped) .avro files stored in GCS and loads them into BigQuery.
Here's the current small setup:
process_file.sh:
#!/bin/bash
set -e
PROJECT_ID="...
0
votes
1
answer
104
views
Set table description when creating table with apache_beam.io.gcp.bigquery.WriteToBigQuery
Is it possible to create a table with a provided description string (for the table) using Apache Beam's WriteToBigQuery?
The additional_bq_parameters argument is useful to set, for example, the ...
0
votes
1
answer
171
views
How can I connect an externally hosted MySQL database to BigQuery? [closed]
I have a MySQL database that is externally hosted (not on Google Cloud ), and I want to connect it to BigQuery for querying and analysis. Unfortunately, I cannot create an SSH tunnel to the MySQL ...
2
votes
0
answers
130
views
How to test (pytest), or mock, an app.state object of fastapi
After some research, I can't find help to mock an app.state.* object(s) :
Here my code :
def get_app():
settings = get_settings()
application = FastAPI()
config = {...}
application = ...
0
votes
2
answers
273
views
403 iam.serviceAccounts.actAs permission error trying to attach a service account to a resource in another project
I'm testing the required permissions to create a scheduled query on BigQuery.
The scheduled query will be programmatically created in project1 with a service account ([email protected]....
0
votes
1
answer
69
views
HORIZON in CREATE_MODEL statement for BigQuery ML forecast using ARIMA_PLUS
What's the purpose of HORIZON in bqml CREATE_MODEL statement for ARIMA_PLUS?
The resulting model doesn't have any forecast and I think CREATE_MODEL will fit the model based on all data points, without ...
0
votes
1
answer
95
views
Python dictionary with nested dict to bigquery
Assume my Biguery table has of JSON type.
I am trying to insert a dictionary to my table using python bq client. I am able to insert the data using different functions like load_table_from_json. I ...