I'm aware of the ability to export a query to CSV files in a GCS Bucket; however, when exporting to multiple files there doesn't seem to be a way to limit the row count of each file. I was wondering if anyone has figured out a workaround to allow for this functionality. My current use-case is I need to export a query of a table (152 columns) into multiple CSV files, saving them to a GCS bucket. Each file can't have more than 5000 records. I was hoping to find some sort of statement I could pop into BigQuery to avoid having to write a solution in python (since this is a short-term requirement and won't be needed after a certain point).
I came up with the following pseudo-SQL-code:
offset = 0
total = SELECT?COUNT(*) FROM?WEB_SCRAPING.scrapy_products WHERE?spider?=?"{?0}?"
while offset < total + 5000:
EXPORT DATA OPTIONS(
uri='gs://webscraping/{?0}?_{offset}-{offset + 5000}.csv',
??? format='CSV',
??? overwrite=true,
??? header=true
??? ) AS (
??? SELECT * EXCEPT (id)
??? ??? FROM WEB_SCRAPING.scrapy_products
??? ??? WHERE spider = "{0}"
??? ??? LIMIT?5000 OFFSET?{offset};
??? )
??? offset += 5000
I was thinking that I could somehow wrap the EXPORT DATA
statement in a CASE statement that loops, incrementing the offset until all results are exported. But, from what I've seen the EXPORT DATA
statement requires an asterisk, which will likely lead to the 5000-record chunks being split into even smaller files.
Any idea on how I can make this work from a BigQuery SQL standpoint? Is it even possible?
Update: Here's what I have so far as BigQuery statement
DECLARE spider STRING DEFAULT 'albertsons.albertsons';
DECLARE increment INT64 DEFAULT 5000;
DECLARE tt_name STRING;
DECLARE `offset` INT64;
DECLARE total INT64;
SET `offset` = 0;
SET total = (SELECT COUNT(*) as total_products FROM `WEB_SCRAPING.SCRAPY_PRODUCTS` WHERE spider = FORMAT('"%s"', spider));
WHILE offset < (total + increment) DO
SET tt_name = FORMAT("`WEB_SCRAPING.%s_%dto%d`", @spider, @offset, @`offset` + @increment);
CREATE TEMP TABLE tt_name AS (
SELECT * EXCEPT (id)
FROM `WEB_SCRAPING.SCRAPY_PRODUCTS`
WHERE spider = FORMAT('"%s"', @spider)
LIMIT @increment OFFSET @`offset`
);
-- Now, to just handle the export portion
SET `offset` = @`offset` + @increment;
END WHILE;
question from:
https://stackoverflow.com/questions/66048044/is-it-possible-to-export-a-bigquery-query-into-equal-5000-line-csv-files-in-gcs 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…