hadoop - Pig vs Hive vs Native Map Reduce

Question

Welcome To Ask or Share your Answers For Others

hadoop - Pig vs Hive vs Native Map Reduce

1 Answer

深蓝 · Answer 1 · 2021-10-23T18:37:24+0000

Complex branching logic which has a lot of nested if .. else .. structures is easier and quicker to implement in Standard MapReduce, for processing structured data you could use Pangool, it also simplifies things like JOIN. Also Standard MapReduce gives you full control to minimize the number of MapReduce jobs that your data processing flow requires, which translates into performance. But it requires more time to code and introduce changes.

Apache Pig is good for structured data too, but its advantage is the ability to work with BAGs of data (all rows that are grouped on a key), it is simpler to implement things like:

Get top N elements for each group;
Calculate total per each group and than put that total against each row in the group;
Use Bloom filters for JOIN optimisations;
Multiquery support (it is when PIG tries to minimise the number on MapReduce Jobs by doing more stuff in a single Job)

Hive is better suited for ad-hoc queries, but its main advantage is that it has engine that stores and partitions data. But its tables can be read from Pig or Standard MapReduce.

One more thing, Hive and Pig are not well suited to work with hierarchical data.

Categories

hadoop - Pig vs Hive vs Native Map Reduce

hadoop - Pig vs Hive vs Native Map Reduce

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags