Help with retrieving large results sets and memory usage.

From: Peter(dot)Rupp(at)ual(dot)com
To: pgsql-interfaces(at)postgresql(dot)org
Cc: Peter(dot)Rupp(at)ual(dot)com
Subject: Help with retrieving large results sets and memory usage.
Date: 2003-08-18 14:44:11
Message-ID: H0002de120a5a419.1061217851.awhq6377@MHS
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-interfaces

Hi everybody,
I have several general and specific questions regarding the protocol
that postgresql uses to send large result sets to client processes over
a TCP network connection.

I'm connecting to a 7.3.4 Postgres database, running on HPUX 11.i. I'm
using two ways to connect using the network: 1) Connecting from Java
using the JDBC type-4 driver built in the postgres distribution, and 2)
Connecting from C-based Python using the Python PyGreSQL interface
module (pg), also supplied and built from the 7.3.4 postgres
distribution.

When using either interface, I notice that when I select rows from a
table, it appears that the client interface retrieves the entire
results set at once...and loads it into memory on the client machine.
When I recently had to retrieve all rows in a table (the data size
totalled 110Meg)
The memory usage in the JVM jumped to rougly 330Meg. In Python, it went
to 440-450Meg.
I observed the memory usage with a fairly sophisticated performance
monitoring tool (HP's glance/perfview)

I understand that all interfaces need to keep a copy of the data around
somewhere, so this is understandable; However, there are times when I
will need to select a large amount of data.... where there could be
thousands or millions of rows and the data returned could easily exceed
the memory size of our machine.

For a work-around to this issue, I looked at:

1) I could break the sql statement up into multiple calls....attempting
to limit the size of the results set
retrieved. However, this proves very difficult (if not impossible)
for several of our tables.

2) Perhaps the postgres network protocol and/or the postgres
'postmaster' daemon allow for retrieval of a portion of a results set at
a time?
In other words, is there a way for the client JDBC or Pysql module
to bring in only
a portion of the result set into memory? That is....read say (500
rows, or 50meg of data
from the network, hand this off to the client program. When program
needs more, then
the adaptor reads the next 500 rows, or something to that effect.

3) I'm not sure if the JDBC (or any other db connection api) allows
for row/size buffering as I describe in #2, perhaps this is an
efficiency that's left to the implementation. However, I'm proficient
enough in C and Java to add this functionality, if postgres engine
itself will allow it.
Do you folks publish the network symantics and protocols used to
make sql requests and retrieve their results? Any documentation about
byte-stream structures, etc would be really helpful; I could look at the
source code, but if there's anybody who can summarize this...it could
help me a lot.

Thanks very much for your assistance.
==pete==

Best regards,
==pete==
P. A. Rupp
United Airlines Corp. (WHQKT)
voice: 847-700-3226
email: peter(dot)rupp(at)ual(dot)com

Responses

Browse pgsql-interfaces by date

  From Date Subject
Next Message Tom Lane 2003-08-19 02:50:06 Re: Help with retrieving large results sets and memory usage.
Previous Message Alastair G. Hogge 2003-08-17 09:25:50 Problems with my string and pgdb and .execute()