Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
390 views
in Technique[技术] by (71.8m points)

csv - AWS Glue作业-CSV到木地板。 如何忽略标题?(AWS Glue Job - CSV to Parquet. How to ignore header?)

I need to convert a bunch (23) of CSV files (source s3) into parquet format.

(我需要将一堆(23)CSV文件(源s3)转换为镶木地板格式。)

The input CSV contains headers in all files.

(输入的CSV在所有文件中都包含标题。)

When I generated code for that using Glue.

(当我使用Glue为它生成代码时。)

The output contains 22 header rows also in separate rows which means it ignored the first header.

(输出在单独的行中也包含22个标题行,这意味着它忽略了第一个标题。)

I need help in ignoring all the headers while doing this transformation.

(在进行此转换时,我需要帮助忽略所有标头。)

Since I'm using from_catalog function for my input, I don't have any format_options to ignore the header rows.

(由于我在输入中使用from_catalog函数,因此我没有任何format_options可以忽略标题行。)

Also, can I set an option in the Glue table that the header is present in the files?

(另外,是否可以在Glue表中设置文件中存在标题的选项?)

Will that automatically ignore the header when my job runs?

(运行我的作业时,会自动忽略标题吗?)

Part of my current approach is below.

(下面是我目前的做法的一部分。)

I'm new to Glue.

(我是胶水新手。)

This code was actually auto-generated by Glue.

(该代码实际上是由Glue自动生成的。)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "my_datalake", table_name = "my-csv-files", transformation_ctx = "datasource0")

datasink1 = glueContext.write_dynamic_frame.from_options(frame = datasource0, connection_type = "s3", connection_options = {"path": "s3://my-bucket-name/full/s3/path-parquet"}, format = "parquet", transformation_ctx = "datasink1")
  ask by Hemanth S. Vaddi translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...