Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
260 views
in Technique[技术] by (71.8m points)

tinkerpop3 - How to perform pagination in Gremlin

In Tinkerpop 3, how to perform pagination? I want to fetch the first 10 elements of a query, then the next 10 without having to load them all in memory. For example, the query below returns 1000,000 records. I want to fetch them 10 by 10 without loading all the 1000,000 at once.

g.V().has("key", value).limit(10)

Edit

A solution that works through HttpChannelizer on Gremlin Server would be ideal.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

From a functional perspective, a nice looking bit of Gremlin for paging would be:

gremlin> g.V().hasLabel('person').fold().as('persons','count').
               select('persons','count').
                 by(range(local, 0, 2)).
                 by(count(local))
==>[persons:[v[1],v[2]],count:4]
gremlin> g.V().hasLabel('person').fold().as('persons','count').
               select('persons','count').
                 by(range(local, 2, 4)).
                 by(count(local))
==>[persons:[v[4],v[6]],count:4]

In this way you get the total count of vertices with the result. Unfortunately, the fold() forces you to count all the vertices which will require iterating them all (i.e. bringing them all into memory).

There really is no way to avoid iterating all 100,000 vertices in this case as long as you intend to execute your traversal in multiple separate attempts. For example:

gremlin> g.V().hasLabel('person').range(0,2)
==>v[1]
==>v[2]
gremlin> g.V().hasLabel('person').range(2,4)
==>v[4]
==>v[6]

The first statement is the same as if you'd terminated the traversal with limit(2). On the second traversal, that only wants the second two vertices, it not as though you magically skip iterating the first two as it is a new traversal. I'm not aware of any TinkerPop graph database implementation that will do that efficiently - they all have that behavior.

The only way to do ten vertices at a time without having them all in memory is to use the same Traversal instance as in:

gremlin> t = g.V().hasLabel('person');[]
gremlin> t.next(2)
==>v[1]
==>v[2]
gremlin> t.next(2)
==>v[4]
==>v[6]

With that model you only iterate the vertices once and don't bring them all into memory at a single point in time.

Some other thoughts on this topic can be found in this blog post.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...