Re: OutOfMemory when inserting stream of unknown length

From: "Mikko T(dot)" <mtiihone(at)cc(dot)hut(dot)fi>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: OutOfMemory when inserting stream of unknown length
Date: 2004-08-20 09:23:18
Message-ID: Pine.OSF.4.60.0408201155140.368856@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On Fri, 20 Aug 2004, Oliver Jowett wrote:

>> The PreparedStatement.setBinaryStream(int parameterIndex,
>> InputStream x,
>> int length)
>>
>> Sets the designated parameter to the given input stream, which will have
>> the specified number of bytes.
>
> This sounds like a requirement to me -- i.e. it is an error to pass a stream
> that does not have the specified number of bytes.
>
>> .. The data will be read from the stream as needed until end-of-file is
>> reached.
>
> .. but as usual the JDBC javadoc goes on to contradict itself. sigh. I wish
> sun could come up with a proper spec for a change, not just a collection of
> partially-documented APIs.
>
> As I see it the reason for having a length value there is so that the driver
> can stream the data directly to the DB even when it needs to know the length
> ahead of time. This is exactly the case with the postgresql driver. If we
> can't trust the length field to be accurate, then we must read the entire
> stream into heap before starting. In that case setBinaryStream() is no better
> than setBytes()..

But now the current implementation forces the user of the setBinaryStream to
buffer all bytes to memory just to know how many bytes the stream will
actually hold. And after that it can already call setBytes thus making the
whole setBinaryStream useless when the stream length is not known beforehand.

> I could live with an interpretation that says "always store exactly length
> bytes, but then read to EOF if there are extra bytes left over". It would
> still be an error to supply less than 'length' bytes in the stream; I think
> this is better than padding with 0 bytes or anything similar (by the time the
> driver sees EOF, it is committed at the protocol level to writing a parameter
> of length 'length', so it can't just stop at EOF).

I think that commiting to send very large data values that are not even
guaranteed to exist (or be available) makes it impossible to get the protocol
back to known state if for some reason the data cannot be sent or the last
command should be aborted. In the current solution the only option seems to be
to kill the whole connection but I find that quite an extreme thing to do.

In my opinion the protocol level should support sending the data in chunks (4K
- 16K could be optimal). This way the jdbc driver only has to buffer one chunk
ahead and not the whole stream and it always knows there won't be any
IOExceptions or EOF occuring during the sending. Supporting chunked sending
would also allow the jdbc driver to support cancel/rollback the streaming if
it needs to or is asked to do so by the database even if there would still be
megabytes of data to send.

-Mikko

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Oliver Jowett 2004-08-20 09:46:25 Re: OutOfMemory when inserting stream of unknown length
Previous Message Oliver Jowett 2004-08-19 21:27:40 Re: OutOfMemory when inserting stream of unknown length