Not always. Your queries are equivalent. But without WHERE t1.id = t2.id AND t2.id = t3.id
it will be CROSS JOIN
.
Update:
This is interesting question and I decided to add some demo. Let's create two tables:
A(c1 int, c2 string)
and B(c1 int, c2 string)
.
Load data:
insert into table A
select 1, 'row one' union all
select 2, 'row two';
insert into table B
select 1, 'row one' union all
select 3, 'row three';
Check data:
hive> select * from A;
OK
1 row one
2 row two
Time taken: 1.29 seconds, Fetched: 2 row(s)
hive> select * from B;
OK
1 row one
3 row three
Time taken: 0.091 seconds, Fetched: 2 row(s)
Check cross join (implicit join without where
transformed to cross):
hive> select a.c1, a.c2, b.c1, b.c2 from a,b;
Warning: Map Join MAPJOIN[14][bigTable=a] in task 'Stage-3:MAPRED' is a cross product
Warning: Map Join MAPJOIN[22][bigTable=b] in task 'Stage-4:MAPRED' is a cross product
Warning: Shuffle Join JOIN[4][tables = [a, b]] in Stage 'Stage-1:MAPRED' is a cross product
OK
1 row one 1 row one
2 row two 1 row one
1 row one 3 row three
2 row two 3 row three
Time taken: 54.804 seconds, Fetched: 4 row(s)
Check inner join (implicit join with where
works as INNER):
hive> select a.c1, a.c2, b.c1, b.c2 from a,b where a.c1=b.c1;
OK
1 row one 1 row one
Time taken: 38.413 seconds, Fetched: 1 row(s)
Try to perform left join by adding OR b.c1 is null
to the where:
hive> select a.c1, a.c2, b.c1, b.c2 from a,b where (a.c1=b.c1) OR (b.c1 is null);
OK
1 row one 1 row one
Time taken: 57.317 seconds, Fetched: 1 row(s)
As you can see we got inner join again. or b.c1 is null
is ignored
Now left join
without where
and ON
clause (transformed to CROSS):
select a.c1, a.c2, b.c1, b.c2 from a left join b;
OK
1 row one 1 row one
1 row one 3 row three
2 row two 1 row one
2 row two 3 row three
Time taken: 37.104 seconds, Fetched: 4 row(s)
As you can see we got cross again.
Try left join with where
clause and without ON
(works as INNER):
select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1;
OK
1 row one 1 row one
Time taken: 40.617 seconds, Fetched: 1 row(s)
We got INNER join
Try left join with where
clause and without ON
+ try to allow nulls:
select a.c1, a.c2, b.c1, b.c2 from a left join b where a.c1=b.c1 or b.c1 is null;
OK
1 row one 1 row one
Time taken: 53.873 seconds, Fetched: 1 row(s)
Again got INNER. or b.c1 is null
is ignored.
Left join with on
clause:
hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1;
OK
1 row one 1 row one
2 row two NULL NULL
Time taken: 48.626 seconds, Fetched: 2 row(s)
Yes, it is true left join.
Left join with on
+ where
(got INNER):
hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1;
OK
1 row one 1 row one
Time taken: 49.54 seconds, Fetched: 1 row(s)
We got INNER because WHERE does not allow NULLS.
Left join with where + allow nulls:
hive> select a.c1, a.c2, b.c1, b.c2 from a left join b on a.c1=b.c1 where a.c1=b.c1 or b.c1 is null;
OK
1 row one 1 row one
2 row two NULL NULL
Time taken: 55.951 seconds, Fetched: 2 row(s)
Yes, it is left join.
Conclusion:
- Implicit join works as INNNER (with where) or CROSS if without WHERE
clause.
- Left join can work as CROSS if without ON and without WHERE, can also work as INNER if WHERE clause does not allows nulls
for right
table.
- Better use ANSI syntax because it is self-explaining and it is easy to understand what do you expect it to work like. Implicit joins or left joins working as INNER or CROSS are difficult to understand and very prone to error.