Re: Resurrected thread: Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter)

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Jeremy Whiting <jwhiting(at)redhat(dot)com>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, magog001(at)web(dot)de
Subject: Re: Resurrected thread: Speed improvement - Group batch Insert - Rewrite the INSERT at the driver level (using a parameter)
Date: 2015-04-09 14:52:44
Message-ID: CAMsr+YFvynME+FrbP_8C+VjCG1m0GRofaW_u3F+tEBg+U+YPAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On 26 March 2015 at 03:34, Jeremy Whiting <jwhiting(at)redhat(dot)com> wrote:

>
> For use cases involving applications using an ORM like Hibernate COPY
> isn't supported nor likely to. Hibernate doesn't have any concept of
> handling files on the database system.
>

There is no need for a file on the database system. It's just a stream of
rows in CSV-like form, sent from the client to the server over the
PostgreSQL wire protocol. PgJDBC's COPY protocol support uses the
server-side COPY ... FROM STDIN command to implement this

Nonetheless, it's extension API, not something that's part of the JDBC
interfaces, and it's not likely an ORM would adopt it.

> What are the thoughts for having this optimization introduced into pgjdbc
> driver ?
>

IMO: Use batching. The JDBC API provides batching features, and PgJDBC
implements them. A PreparedStatement batch does a single Parse, then
repeated Bind/Describe/Execute statements, and will perform very well.

Failure to use such APIs is a bug in the ORM.

There's room for improvement in PgJDBC's current batching support though.
For one thing it does its buffer management totally backwards (see github
issues), meaning it has to force round-trips more often than it would
ideally need to. Further improvement would involve reading replies in a
separate thread so we didn't have to force Sync at all.

To further improve things (in sane ways that didn't involve some scary
query-parsing hacks) would involve a PostgreSQL protocol enhancement to let
PostgreSQL accept parameter arrays via table-valued Bind messages. So we
could essentially send "INSERT INTO ... VALUES $TABLE" . I'd love to have
this, but I don't really see it being implemented when we already have
COPY; someone would have to provide a really compelling argument and a
solid implementation.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dave Cramer 2015-04-09 15:01:34 Re: Bug report: TCP deadlock between JDBC & Postgres
Previous Message Royce Ausburn 2015-04-09 02:30:25 Bug report: TCP deadlock between JDBC & Postgres