Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
934 views
in Technique[技术] by (71.8m points)

mapreduce - Would an HBase Scan Performs Better with Multiple Column Families or Single Column Family?

I would like to store an object (payload) along with some metadata in HBase.

Then I would like to run queries on the table and pull out the payload part based on metadata info.

For example, let's say I have the following column qualifiers

  • P: Payload (larger than M1 + M2).
  • M1: Meta-Data1
  • M2: Meta-Data2

Then I would run a query such as:

  • Fetch all Payload where M1='search-key1' && M2='search-key2'

Does it make sense to:

  1. keep M1 and M2 in one column family and P in another column family? Would the scan be quicker?
  2. Keep all 3 columns in the same column family?

Normally, I would do a spike (I may still need to) - I thought I ask first.

question from:https://stackoverflow.com/questions/65851636/would-an-hbase-scan-performs-better-with-multiple-column-families-or-single-colu

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I'd try to follow the advice given in HBase Reference and go with option #2 (Keep all 3 col in the same column family):

Try to make do with one column family if you can in your schemas. Only introduce a second and third column family in the case where data access is usually column scoped; i.e. you query one column family or the other but usually not both at the one time.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...