bash - Delete row which has more than X columns in a csv

Question

Welcome To Ask or Share your Answers For Others

bash - Delete row which has more than X columns in a csv

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

bash - Delete row which has more than X columns in a csv

I need to delete all the rows in a csv file which have more than a certain number of columns.

This happens because sometimes the code, which generates the csv file, skips some values and prints the following on the same line.

Example: Consider the following file to parse. I want to remove all the rows which have more than 3 columns (i.e. the columns of the header):

timestamp,header2,header3
1,1val2,1val3
2,2val2,2val3
3,4,4val2,4val3
5val1,5val2,5val3
6,6val2,6val3

The output file I would like to have is:

timestamp,header2,header3
1,1val2,1val3
2,2val2,2val3
5val1,5val2,5val3
6,6val2,6val3

I don't care if the row with timestamp 4 is missing.

I would prefer a solution in bash or perhaps using awk, rather than a python one, so that I can learn how to use it.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:39:39+0000

This can be done straight forward with awk:

awk -F, 'NF<=3' file

This uses the awk variable NF that holds the number of fields in the current line. Since we have set the field separator to the comma (with -F, or, equivalent, -v FS=","), then it is just a matter of checking when the number of fields is not higher than 3. This is done with NF<=3: when this is true, the line will be printed automatically.

Test

$ awk -F, 'NF<=3' a
timestamp,header2,header3
1,1val2,1val3
2,2val2,2val3
5val1,5val2,5val3
6,6val2,6val3

Categories

bash - Delete row which has more than X columns in a csv

bash - Delete row which has more than X columns in a csv

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Test

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags