Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
215 views
in Technique[技术] by (71.8m points)

java - JDBC batch operations understanding

I use Hibernate ORM and PostgreSQL in my application, and sometimes i use batch operations. And at first I didn't understand why in the logs with size of the batch = 25, 25 queries are generated, and at first thought that it does not work correctly. But after that I looked at the source code of the pg driver and found the following lines in the PgStatement class:

 public int[] executeBatch() throws SQLException {
        this.checkClosed();
        this.closeForNextExecution();
        if (this.batchStatements != null && !this.batchStatements.isEmpty()) {
            this.transformQueriesAndParameters();
//confuses next line, because we have array of identical queries
            Query[] queries = (Query[])this.batchStatements.toArray(new Query[0]);
            ParameterList[] parameterLists = 
(ParameterList[])this.batchParameters.toArray(new ParameterList[0]); 
            this.batchStatements.clear();
            this.batchParameters.clear();

and in PgPreparedStatement class

    public void addBatch() throws SQLException {
        checkClosed();
        if (batchStatements == null) {
          batchStatements = new ArrayList<Query>();
          batchParameters = new ArrayList<ParameterList>();
        }

        batchParameters.add(preparedParameters.copy());
        Query query = preparedQuery.query;
    //confuses next line
        if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {
          batchStatements.add(query);
        }
      }

I noticed that it turns out that if the size of the batch goes 25, 25 queries are sent with the parameters attached to them.

Logs of the database confirm this, for example:

2017-12-06 01:22:08.023 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_3: BEGIN
2017-12-06 01:22:08.024 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_4: select nextval ('tests_id_seq')
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)     
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '1'
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_2: insert into tests (name, id) values ($1, $2)
2017-12-06 01:22:08.041 MSK [18402] postgres@buzzfactory ПОДРОБНОСТИ:  параметры: $1 = 'test', $2 = '2'
...
x23 queries with parameters 
...
2017-12-06 01:22:08.063 MSK [18402] postgres@buzzfactory СООБЩЕНИЕ:  выполнение S_5: COMMIT

But i thought one query must be executed with an array of 25 parameters. Or I don't understand how batch inserts work with a prepared statement? Why duplicate one query n times?

After all, i tried to debug my queries on this place

if (!(query instanceof BatchedQuery) || batchStatements.isEmpty()) {

and noticed that my queries are always instance of SimpleQuery instead of BatchedQuery. Maybe this is the solution to the problem? Information about BatchedQuery i couldn't find

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There might be various kinds of batching involved, and I would cover PostgreSQL JDBC driver (pgjdbc) part of it.

TL;DR: pgjdbc does use less network roundrips in case batch API is used. BatchedQuery is used only if reWriteBatchedInserts=true is passed to the pgjdbc connection settings.

You might find https://www.slideshare.net/VladimirSitnikv/postgresql-and-jdbc-striving-for-high-performance relevant (slide 44,...)

When it comes to query execution, network latency is often a significant part of the elapsed time.

Suppose the case is to insert 10 rows.

  1. No batching (e.g. just PreparedStatement#execute in a loop). The driver would perform the following

    execute query
    sync <-- wait for the response from the DB
    execute query
    sync <-- wait for the response from the DB
    execute query
    sync <-- wait for the response from the DB
    ...
    

    Notable time would be spent in the "waiting for the DB"

  2. JDBC batch API. That is PreparedStatement#addBatch() enables driver to send multiple "query executions" in a single network roundtrip. Current implementation, however would still split large batches into smaller ones to avoid TCP deadlock.

    The actions would be much better:

    execute query
    ...
    execute query
    execute query
    execute query
    sync <-- wait for the response from the DB
    
  3. Note, that even with #addBatch, there's overhead of "execute query" commands. It does take server notable time to process each message individually.

    One of the ways to reduce the number of queries is to use multi-values insert. For instance:

    insert into tab(a,b,c) values (?,?,?), (?,?,?), ..., (?,?,?)
    

    This PostgreSQL enables to insert multiple rows at once. The drawback is you don't have detailed (per-row) error message. Currently Hibernate does not implement multi-values insert.

    However pgjdbc can rewrite regular batch inserts into multi-values on the fly since 9.4.1209 (2016-07-15).

    In order to activate multi-values rewrite, you need to add reWriteBatchedInserts=true connection property. The feature was initially developed in https://github.com/pgjdbc/pgjdbc/pull/491

    It is smart enough to use 2 statements in order to insert 10 rows. The first one is 8-valued statement, and the second one is 2-valued statement. Usage of powers of two enables pgjdbc to keep the number of distinct statements sane, and that improves performance as often-used statements are server-prepared (see What's the life span of a PostgreSQL server-side prepared statement )

    BatchedQuery is representing that kind of multi-valued statements, so you will see that class used in reWriteBatchedInserts=true case only.

    The drawbacks of the feature might include: lower details as the "batch result". For instance, regular batch gives you "per statement rowcount", however in multi-values case you just get "statement completed" status. On top of that, on-the-fly rewritter might fail to parse certain SQL statements (e.g. https://github.com/pgjdbc/pgjdbc/issues/1045 ).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...