Re: V3 protocol, batch statements and binary transfer

From: Alan Stange <stange(at)rentec(dot)com>
To: pg(at)fastcrypt(dot)com
Cc: Andrea Aime <andrea(dot)aime(at)aliceposta(dot)it>, PostgreSQL JDBC Mailing List <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: V3 protocol, batch statements and binary transfer
Date: 2004-03-30 22:10:32
Message-ID: 4069F058.9060801@rentec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hello all,

We have the same performance problems with bulk data inserts from jdbc
as well. We used batches as well but made sure that each statement in
the batch was large ~128KB and inserted on many rows at a time. This
cut down on the number of round trips to to the postgresql server.

In addition to a) and b) below, I'd add that the read size off the
sockets is too small. It's a few KB currently and this should
definitely be bumped up to a larger number.

We're running on a gigE network and see about 50MB/s data rates coming
off the server (using a 2GB shared memory region). This sounds nice,
but one has to keep in mind that the data is binary encoded in text.

Anyway, count me in to work on the jdbc client as well (in my limited
time). To start, I have a couple of local performance hacks for which
I should submit proper patches.

-- Alan

Dave Cramer wrote:

>Andrea,
>
>Feel free to chip in, if you can help with the V3 implementation your
>patches would be greatly appreciated.
>
>Dave
>On Tue, 2004-03-30 at 03:44, Andrea Aime wrote:
>
>
>>Hi jdbc driver hackers,
>>my name's Andrea and I'm hitting some serious performance problem with the driver.
>>At present I'm working with the Postgis GIS extension and I'm hitting very low
>>performance during mass data insertion due to the driver limitations. Basically,
>>I want to turn a 20 MB shapefile into a postgres table, but it has to be an import
>>function on the client side (windows pc) so I can't just go to the command line and
>>issue a copy. But that's just an example, in general I need to perform mass insert
>>or updates in a transactional environment from a client, usually a Windows PC.
>>
>>As far as I can tell the low performance level is due to:
>>a) lack of true support of batch statements as introduced by the V3 protocol, that
>> makes the network latency bite me very badly while I'm inserting that 100000 rows
>>b) use of the text mode instead of the binary one, more than doubling the size of data
>> that are really transfered over the wire
>>
>>That makes the insertion of the above file take more than 2 minutes on a 100MB ethernet
>>(oh, I have to pass thru 3 switches, so the latency is not that good). A
>>reasonable transfer time for that amount of data should be less than 30 seconds IMHO.
>>
>>I'm wondering, why do you use the text mode instead of the more efficient binary one?
>>Secondly, reading the e-mails on the archive it appears that you are short of time
>>for implementing the V3 protocol. Can I help somehow?
>>
>>Best regards
>>Andrea Aime
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 6: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>>
>>
>>

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Kris Jurka 2004-03-30 22:15:20 Re: JDBC driver's (non-)handling of InputStream:s
Previous Message Dave Cramer 2004-03-30 19:52:51 Re: OutOfMemory