I need to convert a bunch (23) of CSV files (source s3) into parquet format.
(我需要将一堆(23)CSV文件(源s3)转换为镶木地板格式。)
The input CSV contains headers in all files. (输入的CSV在所有文件中都包含标题。)
When I generated code for that using Glue. (当我使用Glue为它生成代码时。)
The output contains 22 header rows also in separate rows which means it ignored the first header. (输出在单独的行中也包含22个标题行,这意味着它忽略了第一个标题。)
I need help in ignoring all the headers while doing this transformation. (在进行此转换时,我需要帮助忽略所有标头。)
Since I'm using from_catalog
function for my input, I don't have any format_options
to ignore the header rows.
(由于我在输入中使用from_catalog
函数,因此我没有任何format_options
可以忽略标题行。)
Also, can I set an option in the Glue table that the header is present in the files?
(另外,是否可以在Glue表中设置文件中存在标题的选项?)
Will that automatically ignore the header when my job runs? (运行我的作业时,会自动忽略标题吗?)
Part of my current approach is below.
(下面是我目前的做法的一部分。)
I'm new to Glue. (我是胶水新手。)
This code was actually auto-generated by Glue. (该代码实际上是由Glue自动生成的。)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "my_datalake", table_name = "my-csv-files", transformation_ctx = "datasource0")
datasink1 = glueContext.write_dynamic_frame.from_options(frame = datasource0, connection_type = "s3", connection_options = {"path": "s3://my-bucket-name/full/s3/path-parquet"}, format = "parquet", transformation_ctx = "datasink1")
ask by Hemanth S. Vaddi translate from so 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…