JDBC: LONGVARBINARY upload patch

From: Fernando Nasser <fnasser(at)redhat(dot)com>
To: PostgreSQL jdbc list <pgsql-jdbc(at)postgresql(dot)org>, pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: JDBC: LONGVARBINARY upload patch
Date: 2003-05-27 18:49:57
Message-ID: 3ED3B355.4080904@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc pgsql-patches

This patch drastically reduces the memory requirements and time taken to
upload large binary values to bytea fields, which are which JDBC
LONGVARBINARY have been implemented with.

The idea is to use this one for fixing setBinaryStream() and the patch
by Chris Smith <cdsmith(at)twu(dot)net> to fix setAsciiStream() and
setUnicodeStream(). In that case we do not have a backend function to
make a LO into a text field and at least it will not be all binary values.

I have included in the diff my previous patch "JDBC: Wrong type" as it
is a pre-requisite for both my patch and Chris' and that doesn't seem to
have been checked in yet.

Here is how we get to this solution:

I used a test program (written by Holger Haag <hhaag(at)gmx(dot)de>) to create
a bytea field and read it back. I've used a 54M text file for testing
(a binary file would perform worse as there would be more expansion into
octal representation).

With the current driver I needed mx = 660M to upload the file to a bytea
and it would take forever.

By fixing the code (not changing the method of doing it) I was able to
lower this requirement to mx=440M. This is the patch I've submitted as
"JDBC: Better initial capacity for StringBuffers reduces memory usage".
But remember this was an ASCII file, things would be much worse with a
binary one. And it still takes too much time to do.

Anyway, sending encoded binary at five characters per byte from the
front-end to the back-end and having that parsed is not a good option,
so I did change the method to avoid all that.

To do so, I had to use the interface that is provided to PostgreSQL
Large Objects (LOs) to get them into a LO and from there to the bytea
field itself. Tom Lane has kindly pointed me to a backend function that
could be used for loading the bytea from the LO.

The method creates a LO that is used as a staging area, moves the data
to the bytea destination and then removes the LO. If something goes
wrong this LO goes away as well as this is done inside a transaction
(LOs can only be manipulated inside a transaction -- but I did it
transparent to the driver's user). It is also secure as the LO will not
be seem by others until the transaction is committed and it will be
removed before that happens.

With this new method, an mx of only 70M is required! And it takes much
less time do do it (an order of magnitude less!).

But before you get overly excited be warned that there is a price to pay
for that. As I mentioned, I create and delete a LO in the process (to
use as the staging area). This uses up (temporarily) space in a table
called pg_largeobjects. Unfortunately, PostgreSQL still does not reuse
empty space on tables. So, this table will increase by the size of
every file uploaded (plus some index overhead). The only way to recup
this space is by running periodically the command:

VACUUM [ANALYZE] [VERBOSE] pg_largeobjects;

After this gets checked in, I will adjust Chris' patch to work with
non-binary data and probably also send a patch for getBytes() as well.

Regards to all,
Fernando

--
Fernando Nasser
Red Hat Canada Ltd. E-Mail: fnasser(at)redhat(dot)com
2323 Yonge Street, Suite #300
Toronto, Ontario M4P 2C9

Attachment Content-Type Size
JDBCLONGVARBINARY.PATCH text/plain 6.4 KB

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dave Cramer 2003-05-28 09:53:39 Re: WARNING: ROLLBACK: no transaction in progress
Previous Message Fernando Nasser 2003-05-27 17:56:29 JDBC: Reinstate autocommit client hack for 7.4

Browse pgsql-patches by date

  From Date Subject
Next Message Peter Eisentraut 2003-05-27 19:42:29 Re: Adding Rendezvous support to postmaster
Previous Message Fernando Nasser 2003-05-27 17:56:29 JDBC: Reinstate autocommit client hack for 7.4