Re: [JDBC] BUG #1347: Bulk Import stopps after a while ( 8.0.0.

From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kris Jurka <books(at)ejurka(dot)com>, Bahadur Singh <bahadursingh(at)yahoo(dot)com>, pgsql-bugs(at)postgresql(dot)org, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: [JDBC] BUG #1347: Bulk Import stopps after a while ( 8.0.0.
Date: 2004-12-13 20:58:16
Message-ID: 41BE0268.1060602@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-jdbc

Tom Lane wrote:
> Kris Jurka <books(at)ejurka(dot)com> writes:
>
>> // To avoid this, we guess at how many queries we can send before the
>> // server -> driver stream's buffer is full (MAX_BUFFERED_QUERIES).
>
>
> It seems essentially impossible for the driver to do this reliably,
> since it has no clue how much data any one query will return.

Right, but I'm not convinced that this is the problem here as batch
execution in JDBC is only allowed to do non-result-set-returning
queries. The only case I can think of where this would break is if
something is causing lots of logging output to the client (triggers etc.).

> How about instead thinking in terms of not filling the driver->server
> channel? You have a good handle on how much data you have sent (or at
> least you could keep track of that), and if you bound it to 64K or so
> then you should be safe. Perhaps the limit ought to be easily
> configurable just in case, but at least you'd be measuring something
> measurable.

That's possibly a better idea but it does mean that we wouldn't be able
to batch inserts that contain lots of data. That's the use case I needed
to support when I wrote this in the first place..

Also, it's never going to be 100% without a separate thread, as the
server can spontaneously generate output (e.g. because of NOTIFY)
regardless of how careful we are with our queries.

There's actually another problem with this code: the subdivision into
smaller batches is not transparent if autocommit is on. We send a Sync
at the end of the batch which will cause an implicit commit. We should
be sending a Flush, but it's harder for the driver to handle this as a
Flush does not provoke a response message from the server, so we would
have to track the protocol state more closely. Given that JDBC is silent
about the autocommit semantics of batch execution anyway, I'm not too
worried about fixing this urgently.

I'd like to see that this is really the problem before tweaking this
code. Given that the OP said that batch sizes of 1000-2000 worked OK,
I'm not sure that this code is the problem since the maximum number of
queries we'll send per batch is around 250 by default.

-O

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Oliver Jowett 2004-12-13 21:13:57 Re: [JDBC] BUG #1347: Bulk Import stopps after a while (
Previous Message Bryan Roberts 2004-12-13 20:26:21 Re: Win 32 'could not attach to proper memory at fixed address'

Browse pgsql-jdbc by date

  From Date Subject
Next Message Oliver Jowett 2004-12-13 21:13:57 Re: [JDBC] BUG #1347: Bulk Import stopps after a while (
Previous Message Ricardo Vaz Mannrich 2004-12-13 19:39:45 Interval type