Re: Bugs in 7.1beta JDBC driver

From: Barry Lind <barry(at)xythos(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Bugs in 7.1beta JDBC driver
Date: 2001-01-13 02:43:04
Message-ID: 3A5FC0B8.DCD5980E@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-interfaces

In looking at the current code from CVS I don't see any changes that
relate to my original bug report. It appears that this is still
broken. What files where patched and what versions?

thanks,
--Barry

Bruce Momjian wrote:
>
> Yesterday, I applied a patch to jdbc that fixes this particular area.
> Can you grab the current snapshot and let me know if this is still a
> problem?
>
> ---------------------------------------------------------------------------
>
> I recently pulled and built the 7.1beta1 code and ran our regression
> tests against this version. Our app is java based and uses the JDBC
> drivers for client access. Almost all of our regression tests failed.
> In looking into this further there are some significant problems with
> the latest JDBC code.
>
> Specifically the code that has been added to provide caching of byte[]'s
> doesn't work correctly. This code makes some assumptions that are not
> valid, specifically it assumes that when a new SQL statement is
> executed, all previously used byte[]'s are no longer in use and can be
> reused. Specifically the pg_stream.deallocate() call that appears in
> the Connection.ExecSQL() method:
>
> // Deallocate all resources in the stream associated
> // with a previous request.
> // This will let the driver reuse byte arrays that has already
> // been allocated instead of allocating new ones in order
> // to gain performance improvements.
> pg_stream.deallocate();
>
> The assumption here is invalid. For example if code does the following:
>
> 1) execute a query A
> 2) for each row of the result set execute a different query B
>
> When the query B is executed for the first row of query A, the
> deallocate() method gets called which causes *all* byte[]'s originating
> from the pg_stream object (which includes the row data in the ResultSet
> object) to be reclaimed and reused, including those associated with the
> subsequent rows of query A.
>
> So what ends up happening is that the data in query A's result set
> starts to get corrupted as the underlying byte[]'s get reused by
> subsequent queries.
>
> Believe me this causes all sorts of problems.
>
> A similar problem can be found with the getBytes() method of result
> set. This method returns the underlying byte[] to the caller. The
> caller could keep this object around for well past the next ExecSQL call
> and find that the value gets corrupted as it is being reused in
> subsequent queries.
>
> In addition to the above bugs with this caching code, there are other
> significant problems as well. The code as written never frees up any of
> these cached objects. This causes significant problems when connections
> are long lived (i.e. in a connection pool). This results in essentially
> a memory leak. To illustrate the problem lets say a connection
> performed the following queries:
>
> select 10_byte_char_column from foo;
> select 11_byte_char_column from bar;
> select 12_byte_char_column from foo_bar;
> select 13_byte_char_column from foo_foo;
> ...
>
> further assume that each query returns 1000 rows.
> After the first query runs the cache will contain 1000 byte[10] arrays =
> 10K.
> After the second query runs the cache will contain 1000 byte[10] arrays
> + 1000 byte[11] arrays = 21K.
> After the third query runs the cache will contain 1000 byte[10] arrays +
> 1000 byte[11] + 1000 byte[12] arrays = 33K.
> ...
>
> As you can see, over time it can easily be the case that the amount of
> memory eaten up could be large, especially if lots of queries are being
> run that return thousands of rows of differing sized values. While it
> is true the example above is worst case (i.e. it doesn't show any reuse
> of what is in the cache), this worst case makes the entire caching
> feature as it is currently written a problem for me as my connection
> pool logic will keep these connections open for long periods of time
> during which they will be performing a wide variety of differing sql
> statements, many of which will return large numbers of rows.
>
> Finally, my app does a lot of queries that select values that are > 256
> bytes long. The current cache implementation doesn't help in these
> cases at all (as it only caches arrays up to 256 bytes), but it is with
> these large byte[]'s where the overhead of allocating and then garbage
> collecting them is greatest and could most take advantage caching
> functionality.
>
> I have looked at the source code and thought about how to fix the above
> problems, and I have not been able to come up with anything that works
> well. While is certainly would be possible to code something that
> solves these problems, I fear that the result would actually be slower
> than the current situation in 7.0 where no cache exists and lots of
> arrays are allocated and then reclaimed by the garbage collector.
>
> I would suggest that the cache code be pulled out for 7.1beta2, until a
> better overall solution is found.
>
> thanks,
> --Barry
>
> [ Charset UTF-8 unsupported, skipping... ]
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-interfaces by date

  From Date Subject
Next Message Bruce Momjian 2001-01-13 05:16:42 Re: [PATCHES] Patch for JDBC timestamp problems
Previous Message Barry Lind 2001-01-13 02:35:52 Patch for JDBC timestamp problems