google cloud dataflow - How does apache beam access bigtable data?

Question

Welcome To Ask or Share your Answers For Others

google cloud dataflow - How does apache beam access bigtable data?

asked Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

google cloud dataflow - How does apache beam access bigtable data?

If BigtableIO.Read is run in dataflow, is the data being accessed via a bigtable node or going directly to bigtable tablets?

Bigtable architecture has:

client requests go through a front-end server before they are sent to a Cloud Bigtable node

and goes on to say:

A Cloud Bigtable table is sharded into blocks of contiguous rows, called tablets to help balance the workload of queries... Tablets are stored on Colossus, Google's file system, in SSTable format

(The concern is if there is a dataflow job running at the same as users are making individual request that definitely go through the nodes, whether there will be a small or large amount of contention from the dataflow job. I would guess that if the dataflow job went through the nodes there would be significantly more contention as opposed to hitting the tablets directly.)

question from:https://stackoverflow.com/questions/65876283/how-does-apache-beam-access-bigtable-data

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-06T19:22:22+0000

Beam BigTable connector uses the Cloud BigTable's public API hence requests will be going through the BigTable front end server nodes.

See here for bit more detail regarding BigTable client API usage of the Beam connector.

Categories

google cloud dataflow - How does apache beam access bigtable data?

google cloud dataflow - How does apache beam access bigtable data?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags