Re: JDBC prepared statement: a 32767 limit of arguments number

From: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Vladislav Malyshkin <mal(at)gromco(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-jdbc(at)lists(dot)postgresql(dot)org" <pgsql-jdbc(at)lists(dot)postgresql(dot)org>
Subject: Re: JDBC prepared statement: a 32767 limit of arguments number
Date: 2022-03-10 09:08:26
Message-ID: CAB=Je-G7Xa9KR_sMYuO7TKOURjT7wQ7rdbn-k0trSebMNaaYhg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Vladislav,

Have you tried PreparedStatement#addBatch + #executeBatch APIs and
reWriteBatchedInserts=true connection option?
It might yield the expected speedup without resorting to PG-specific APIs
and syntax like COPY.

David>You do have to deal with converting programming data types to their
textual representations for the COPY command but the driver should be
capable of helping with that

That's right, and the driver might add language-specific APIs for
converting data for COPY-FROM.
However,

1) JDBC specification has nothing for COPY. In other words, the code that
uses
COPY-specific APIs is hard to port to other databases.
Of course, pull requests for adding Java API on top of COPY are welcome,
however, nobody cared to add the wrapper APIs yet.

2) COPY FROM is either binary or text. It does not allow "fallback to text
format".
We might need to revise the decisions, however, I know pgjdbc always sends
date/timestamp-like
values as text even though it knows the way to encode and decode them.
Structs and arrays are not supported in binary coding yet :'(

I think COPY syntax could be relaxed here to allow per-field binary/text
format configuration.

3) COPY FROM is "insert-only", and it can't handle conflicts (COPY FROM
does not support insert on conflict / merge)

4) There's a connection option reWriteBatchedInserts=true so pgjdbc could
rewrite
batch calls for "insert into..." into a multi-values insert:

insert into table(c1, c2, c3, ...) values (....)
=>
insert into table(c1, c2, c3, ...) values (....), (...), (...)

It is a pity that the driver has to parse SQL and detect if it looks like
"insert into ..".
On the other hand, the optimization is compatible with the standard SQL
syntax,
and it does not require application changes except for not relying on the
"number of affected rows for each bound row".

Typically executeBatch() returns the exact row count for each batched row,
and the driver loses that information if it flips to "rewrite into
multivalues" mode.
So can't activate batch rewrite by default.

----

It might be interesting to check if the driver could rewrite simple insert
statements
into COPY FROM STDIN, however, that "autoconversion from INSERT to COPY"
would have to be disabled as soon as any non-trivial INSERT feature appears.

For instance, the driver can't convert literal values and expressions in
"values" into COPY:
INSERT INTO testbatch(id, value, comment) VALUES (?, ?, 'no comment');
INSERT INTO testbatch(id, value) VALUES (1+2, ?);

Vladimir

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Andrew Dunstan 2022-03-10 13:09:00 Re: JDBC prepared statement: a 32767 limit of arguments number
Previous Message David G. Johnston 2022-03-09 20:18:37 Re: JDBC prepared statement: a 32767 limit of arguments number