0

Assume my Biguery table has of JSON type.

I am trying to insert a dictionary to my table using python bq client. I am able to insert the data using different functions like load_table_from_json. I have specified column as JSON type in job config schema of the big query table.

My dictionary in python - {"A": 2, "B": 1, "C": 4, "D": 3}

The result in table -

"

{\n\"A\":2,\n\"B\":1,\n\"C\":4,\n\"D\":3\n}"

I don't want the delimiters to appear. I get why they are appearing as bigquery converts the dictionary to a type it can understand resulting the the above structure when entering into the table.

I Have a manual workaround if i directly write the insert command and use the bq_client.query() function. I can make user of parse_json function , this helps in getting the desired result of {"A": 2, "B": 1, "C": 4, "D": 3}.

But I couldn't find any python BQ library way of achieving the same. Is there a way to do this?

4
  • 2
    Please post your code. It looks like you're calling json.dumps() to convert the dictionary to a string, then inserting that into the JSON column, which causes it to be encoded twice. Commented Jan 24 at 16:04
  • @Barmar No, I am directly putting the dict object. There is no json.dumps being used. I have specified in the big query config schema that it is a json and also passing the object directly Commented Jan 24 at 17:11
  • 1
    If true, it sounds like a bug in the query library, it's encoding twice. As I said earlier, please post your code. Commented Jan 24 at 17:25
  • The question is very interesting however it takes effort to visualize exactly how you obtain this result. Would you please post a minimum working example? Thanks Commented Jan 25 at 22:17

1 Answer 1

1

As Barmar mentioned, it seems like you're serializing the JSON twice. Here's why:

If you've already set your BigQuery column as a JSON type, there's no need to use json.dumps(). BigQuery is already expecting the data in JSON format. When you use json.dumps(), you're serializing the dictionary into a JSON string, and then when you insert it into BigQuery, it serializes it again, causing the escaped characters.

Instead, you should pass the dictionary directly to BigQuery, like this:

data = [{"your_json_column": data}]

client.load_table_from_json(data, 'project.dataset.table')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.