Frankly, I haven't touched the V2 and you are right, I did not apply
the patch to it. It's the first time I take a look at the driver so it
took me a little bit of adaptation, bare with me :)
With regards to individually escaping, it's just as expensive as how
it was before, except that it doesn't make several copies of the array
like it used to (hence the memory saving). I don't think there is any
performance impact at all. Basically just a memory gain.
As far as the encoding. I think in your original email you had
mentioned that the driver used UTF-8 (in which case there is an
obvious optimization that can be made), but I couldn't find it in the
driver. Everything looked like it was inheriting from the encoding
scheme set in the connection.
The stream is not necessary but I didn't like the idea of having bytea
logic inside the stream so I put it into that inner class to make it
easier to move if need be (although I couldn't find a good place for
it). Plus it makes it easier to help stream directly (once we figure this out).
Good point about over-escaping. This would probably resolve this whole
problem of pre-determining the size. I didn't think about this (but
then again, it assumes UTF-8 in all cases and I'm not familiar enough
with how the postgresql driver works to make this call).
Feel free to rework the patch as you want it. I just wanted to provide
a starting point and I'm happy to help out.
Monday, April 12, 2004, 10:08:44 PM, you wrote:
OJ> Michael Privat wrote:
>> Here you go. Coded against the CVS HEAD. Let me know if you want me to
>> change anything. Obviously, because I have to send the size ahead,
>> there is no other way but store the entire byte in memory (can't
>> stream), but with this code, I was able to load a blob of 30Mb+ (with
>> a heap size of 32Mb) against 1.5Mb yesterday with the same heap. So
>> there is some improvement.
OJ> Does this work for V2 connections? I didn't see a patch to the V2
OJ> execution path, but you've changed how setBytes() binds its argument.
OJ> The argument to setBytes() is not copied when bound; changes made to the
OJ> array after the setBytes() call but before query execution will be
OJ> visible. Personally I don't mind this behaviour but others disagree :)
OJ> -- see earlier discussion about mutable parameters in the pgsql-jdbc
OJ> Individually escaping and encoding each byte (in SendBytea and
OJ> GetByteaSize) seems overly expensive. We should be able to do this in
OJ> blocks at a minimum.
OJ> Why is EncodedStream necessary? Can't you just unroll that logic
OJ> directly in SendBytea()?
OJ> It's probably possible to work around needing the whole array in memory
OJ> (i.e. we could stream from an InputStream + length via setBinaryStream).
OJ> If we escape *every* byte, we need exactly 5 bytes of UTF8 output for
OJ> every input byte ('\' and '0'-'9' are all single bytes in UTF8, I
OJ> believe) so we could compute the output size directly from the input length.
OJ> That said, this patch is a lot simpler than I expected it to be! It
OJ> looks like a good solution until the driver knows how to speak V3
OJ> fluently. If you like, I'll see if I can rework your patch over the next
OJ> couple of days to address the above points.
OJ> ---------------------------(end of
OJ> TIP 6: Have you searched our list archives?
In response to
pgsql-jdbc by date
|Next:||From: Kris Jurka||Date: 2004-04-13 02:44:04|
|Subject: Re: bytea size limit?|
|Previous:||From: Oliver Jowett||Date: 2004-04-13 02:08:44|
|Subject: Re: bytea size limit?|