Streaming ResultSet

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Streaming ResultSet
Date: 2005-07-19 20:22:25
Message-ID: s2dd1ab9.027@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hello all.

Within the PostgreSQL JDBC driver a ResultSet is, by default, streamed
back to the client as the rows are generated by the server, and are
stored in a collection in RAM. An attempt to get the first row from the
ResultSet blocks until every last row is stored in RAM. If various
conditions are met, a cursor is used instead, which allows a large
result set to be processed without running out of RAM.

I'm curious about the reasons that this appoach was taken -- why not
return the rows as they are retrieved from the server? I can think of
some possible explanations:

(1) The time locks are held on the server is minimized by the default
behavior.

(2) If an exception occurs, you do not receive that data up to the
point of the problem. (For example, divide by zero, serialization
(deadlock) problems, or broken connections.)

(3) The connection can be used for other purposes without conflict.
(Other drivers I've used, which normall stream the results fall back on
reading the whole result set into RAM when a statement is issued which
cannot be run while the result set is pending.)


The disadvantages of the current default approach are:

(1) The entire result set might not fit in the client RAM.

(2) You might want to see partial results from a query which ultimately
generates an exception.

(3) You can't overlap retrieval of the result set with processing of
the rows.


The disadvantages of the cursored approach are:

(1) It might not be easy to create all of the right conditions for the
cursored approach to work.

(2) You might think you've met all the conditions for the cursored
approach and be wrong. The default behavior might kick in and cause an
OutOfMemoryError.

(3) There is overhead to creating a cursor for a select statement.
Granted, in my tests it was only about 20 microseconds, but that boosted
runtime for my simple test case by 3%.


If there was a connection property which allowed result sets to return
rows as they are retrieved, I would use it. I'm not lobbying very hard
for it, however, since it is only 20 microseconds per SELECT statement
to use cursors, and I can easily modify my code to use them, although it
will have to be an "all or nothing" change in the framework, so the cost
is paid on every SELECT statement. I'm just expressing interest, should
the feature be under consideration.

-Kevin

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Oliver Jowett 2005-07-19 22:48:09 Re: Timestamp Conversion Woes Redux
Previous Message Christian Cryder 2005-07-19 16:32:35 Re: Timestamp Conversion Woes Redux