I have duplicate words in csv. And i need to count it in such way:
jsmith jsmith kgonzales shouston dgenesy kgonzales jsmith
to this:
[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
I have smth like that, but it doesn't work properly for me..or i cant do it enter link description here
A simple way to do it is maintain an array using the username as the index and increment it each time you read a user, e.g.
awk '{ print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' file
The ternary (($1 in a) ? $1 a[$1] : $1) just checks if the user in in a[] yet, and if so uses the name plus the value of the array $1 a[$1] if the user is not in the array, then it just uses the user $1. The result of the ternary is concatenated with "@email.com" to complete the output.
(($1 in a) ? $1 a[$1] : $1)
a[]
$1 a[$1]
$1
"@email.com"
Lastly, the value for the array element for the user is incremented, a[$1]++.
a[$1]++
Example Use/Output
With your names in a file called users you would have:
users
$ awk '{ print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' users [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
To Keep E-mail In Input File
If your input already contains an e-mail at the end of the username, then you simply want to output that record and skip to the next record, e.g.
awk '$1~/@/{print; next} { print (($1 in a) ? $1 a[$1] : $1) "@email.com"; a[$1]++ }' users
That will preserve [email protected] from your comment.
[email protected]
Example Input
jsmith jsmith kgonzales shouston [email protected] dgenesy kgonzales jsmith
Example Output
[email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
2.1m questions
2.1m answers
60 comments
57.0k users