Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
685 views
in Technique[技术] by (71.8m points)

unix - split file on Nth occurrence of delimiter

Is there a one-liner to split a text file into pieces / chunks after every Nth occurrence of a delimiter?

example: the delimiter below is "+"

entry 1
some more
+
entry 2
some more
even more
+
entry 3
some more
+
entry 4
some more
+
...

There are several million entries, so splitting on every occurrence of delimiter "+" is a bad idea. I want to split on, say, every 50,000th instance of delimiter "+".

Unix commands "split" and "csplit" just don't seem to do this...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using awk you could:

awk '/^+$/ { delim++ } { file = sprintf("chunk%s.txt", int(delim / 50000)); print >> file; }' < input.txt 

Update:

To not include the delimiter, try this:

awk '/^+$/ { if(++delim % 50000 == 0) { next } } { file = sprintf("chunk%s.txt", int(delim / 50000)); print > file; }' < input.txt 

The next keyword causes awk to halt processing rules for this record and and advance to the next (line). I also changed the >> to > since if you run it more than once you probably don't want to append the old chunk files.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...