Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
357 views
in Technique[技术] by (71.8m points)

bash - awk counting delimiter didnt go as expected

I was on counting the number of delimiters which is '|@~' from my client data, actually I have to do this because sometimes I received less or more delimiters. I used to use this syntax to find the number of delimiters per row :

awk -F "|@~" '{print NF-1}' myDATA

it usually work, but somehow today it returns only 2 counts, meanwhile I expected 6. After I check the data manually, I can see 6 delimiter there, afterwards I tried to copy manually the row and paste it to the notepad++ , surprisingly not all the lines being copied, only some lines, and surprisingly it contains only 2 delimiter, as the script gave me. What make this happen ?

What I see and I want to copy : 0123|@~123123|@~21321303|@~00000009213123|@~ 002133123123.|@~ 000000000.|@~CITY

Paste result : 0123|@~123123|@~21321303

missing paste : |@~00000009213123|@~ 002133123123.|@~ 000000000.|@~CITY

It seems there is something in between 3rd delimiter with last character of 3rd field, because I have to split copy it 2 times into this site, it makes sense with the awk result that returns only 2 |@~ delimiters, but of course it is 6 not 2.

question from:https://stackoverflow.com/questions/65919131/awk-counting-delimiter-didnt-go-as-expected

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

As your hexdump revealed, there are null bytes in your text file.

GNU Awk 4.1.4 and 5.1.0 seem to threat these as the end of the file. Example:

$ awk '{print NF}' <<< $'a b c
x y'
3
2
$ awk '{print NF}' <<< $'a b c
x y'
1

In man awk I haven't found a way to change this behavior. However, you probably don't want the null bytes in your file to begin with. Therefore you can just delete them before applying awk. To delete all null bytes from a file use this command:

tr -d \0 < /path/to/broken/input/file > /path/to/fixed/output/file

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...