bash - Split CSV files into smaller files but keeping the headers?

Question

Welcome To Ask or Share your Answers For Others

bash - Split CSV files into smaller files but keeping the headers?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

bash - Split CSV files into smaller files but keeping the headers?

I have a huge CSV file, 1m lines. I was wondering if there is a way to split this file into smaller ones but keeping the first line (CSV header) on all the files.

It seems split is very fast but is also very limited. You cannot add a suffix to the filenames like .csv.

split -l11000 products.csv file_

Is there an effective way to do this task in just bash? A one-line command would be great.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:34:56+0000

The answer to this question is yes, this is possible with AWK.

The idea is to keep the header in mind and print all the rest in filenames of the form filename.00001.csv:

awk -v l=11000 '(NR==1){header=$0;next}
                (NR%l==2) {
                   close(file); 
                   file=sprintf("%s.%0.5d.csv",FILENAME,++c)
                   sub(/csv[.]/,"",file)
                   print header > file
                }
                {print > file}' file.csv

This works in the following way:

(NR==1){header=$0;next}: If the record/line is the first line, save that line as the header.
(NR%l==2){...}: Every time we wrote l=11000 records/lines, we need to start writing to a new file. This happens every time the modulo of the record/line number hits 2. This is on the lines 2, 2+l, 2+2l, 2+3l,.... When such a line is found we do:
- close(file): close the file you just wrote too.
- file=sprintf("%s.%0.5d.csv",FILENAME,++c); sub(/csv[.]/,"",file): define the new filename as FILENAME.00XXX.csv
- print header > file: open the file and write the header to that file.
{print > file}: write the entries to the file.

note: If you don't care about the filename, you can use the following shorter version:

awk -v m=100 '
    (NR==1){h=$0;next}
    (NR%m==2) { close(f); f=sprintf("%s.%0.5d",FILENAME,++c); print h > f }
    {print > f}' file.csv

Categories

bash - Split CSV files into smaller files but keeping the headers?

bash - Split CSV files into smaller files but keeping the headers?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags