You can do this by combining two join
s.
$ join -o '0,1.3,2.3' -a1 -a2 -e 'NA' file1 file2
Adam a1 a2
Bills b1 NA
Carol c1 c2
Dean d1 NA
Evan NA e2
First join the first two files together, using -a1 -a2
to make sure lines that are only present in one file are still printed. -o '0,1.3,2.3'
controls which fields are output and -e 'NA'
replaces missing fields with NA
.
$ join -o '0,1.3,2.3' -a1 -a2 -e 'NA' file1 file2 | join -o '0,1.2,1.3,2.3' -a1 -a2 -e 'NA' - file3
Adam a1 a2 NA
Bills b1 NA b3
Carol c1 c2 c3
Dean d1 NA NA
Evan NA e2 e3
Then pipe that join
to another one which joins the third file. The trick here is passing in -
as the first file name, which tells join
to use stdin as the first file.
For an arbitrary number of files, here's a script which applies this idea recursively.
#!/bin/bash
join_all() {
local file=$1
shift
awk '{print $1, $3}' "$file" | {
if (($# > 0)); then
join2 - <(join_all "$@") $(($# + 1))
else
cat
fi
}
}
join2() {
local file1=$1
local file2=$2
local count=$3
local fields=$(eval echo 2.{2..$count})
join -a1 -a2 -e 'NA' -o "0 1.2 $fields" "$file1" "$file2"
}
join_all "$@"
Example usage:
$ ./joinall file1
Adam a1
Bills b1
Carol c1
Dean d1
$ ./joinall file1 file2
Adam a1 a2
Bills b1 NA
Carol c1 c2
Dean d1 NA
Evan NA e2
$ ./joinall file1 file2 file3
Adam a1 a2 NA
Bills b1 NA b3
Carol c1 c2 c3
Dean d1 NA NA
Evan NA e2 e3
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…