This response illustrates an efficient approach using jq.
In the example, the value of .id in each object is a string
and therefore in the first part of this response, it is assumed that the key is always string-valued (in the P.S., this assumption is relaxed).
It is also assumed that the "rows" can be combined without regard to conflicting values. (We use jq's +
to combine objects.)
# hashJoin(a1; a2; field) expects a1 and a2 to be arrays of JSON objects
# and that for each of the objects, the field value is a string.
# A relational join is performed on "field".
def hashJoin(a1; a2; field):
# hash phase:
(reduce a1[] as $o ({}; . + { ($o | field): $o } )) as $h1
| (reduce a2[] as $o ({}; . + { ($o | field): $o } )) as $h2
# join phase:
| reduce ($h1|keys[]) as $key
([]; if $h2|has($key) then . + [ $h1[$key] + $h2[$key] ] else . end) ;
hashJoin( $file1; $file2; .id)[]
Invocation:
$ jq -nc --slurpfile file1 file1.json --slurpfile file2 file2.json -f join.jq
Output:
{"id":"10","data":"abc","content":"yui"}
{"id":"30","data":"qwe","content":"ujm"}
{"id":"40","data":"wsx","content":"tgb"}
P.S. Here is a still more efficient implementation of hashJoin/3
, which relaxes all assumptions about the specified "key" except that it specify a valid key. Composite keys can be specified as arrays.
def hashJoin(a1; a2; key):
def akey: key | if type == "string" then . else tojson end;
def wrap: { (akey) : . } ;
# hash phase:
(reduce a1[] as $o ({}; . + ($o | wrap ))) as $h1
| (reduce a2[] as $o
( {};
($o|akey) as $v
| if $h1[$v] then . + { ($v): $o } else . end )) as $h2
# join phase:
| reduce ($h2|keys[]) as $key
([]; . + [ $h1[$key] + $h2[$key] ] ) ;
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…