hadoop - How does impala provide faster query response compared to hive

Question

Welcome To Ask or Share your Answers For Others

hadoop - How does impala provide faster query response compared to hive

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:26:57+0000

You should see Impala as "SQL on HDFS", while Hive is more "SQL on Hadoop".

In other words, Impala doesn't even use Hadoop at all. It simply has daemons running on all your nodes which cache some of the data that is in HDFS, so that these daemons can return data quickly without having to go through a whole Map/Reduce job.

The reason for this is that there is a certain overhead involved in running a Map/Reduce job, so by short-circuiting Map/Reduce altogether you can get some pretty big gain in runtime.

That being said, Impala does not replace Hive, it is good for very different use cases. Impala doesn't provide fault-tolerance compared to Hive, so if there is a problem during your query then it's gone. Definitely for ETL type of jobs where failure of one job would be costly I would recommend Hive, but Impala can be awesome for small ad-hoc queries, for example for data scientists or business analysts who just want to take a look and analyze some data without building robust jobs. Also from my personal experience, Impala is still not very mature, and I've seen some crashes sometimes when the amount of data is larger than available memory.

Categories

hadoop - How does impala provide faster query response compared to hive

hadoop - How does impala provide faster query response compared to hive

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags