Re: Retrieving ResultSets with millions of rows -

From: Doug Fields <dfields-postgres(at)pexicom(dot)com>
To: Dave Cramer <Dave(at)micro-automation(dot)net>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: Retrieving ResultSets with millions of rows -
Date: 2002-09-14 00:45:50
Message-ID: 5.1.0.14.2.20020913204314.02ce1720@pop.pexicom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc


>Ya, there is, it's a bit of a hack but the only way around it is to use
>a cursor. Eventually the driver will do this automatically, but for now
>you will have to do it manually

That's what I was afraid you were going to say. Thanks for the response.

Question: Do I have to do anything unusual with the JDBC Connection and/or
Statement to put it in a certain mode? I'm not much on the interactions
between the JDBC API and the database transactions (begin/commit, etc.). I
also use the Protomatter connection pool; what do I have to do with the
connection when I release it back to the pool?

Also, a slightly tangential question: If I'm only doing reads in the
transaction (the cursor stuff), is there any performance difference if I
COMMIT or ROLLBACK?

Thanks,

Doug

>so
>begin;
>declare cursor1 CURSOR FOR SELECT ....
>fetch n from cursor1 ...
>end;
>
>Dave
>On Fri, 2002-09-13 at 17:43, Doug Fields wrote:
> > Hello,
> >
> > I've just come to the startling realization that the 7.2 JDBC code loads
> > every single row in a ResultSet into a Vector. (core/QueryExecutor.java)
> >
> > Unfortunately, I run some queries which return millions of rows, each row
> > which could well be 100-1000 bytes and more.
> >
> > Hence, I get an OutOfMemoryError.
> >
> > For some queries, there's an obvious workaround: issue the query with an
> > appropriate suffix "ORDER BY <something> LIMIT <something> OFFSET
> > <something>" several times in succession. This will, of course, work
> mostly
> > fine assuming it's a simple single-table query with an appropriate index
> > (such as a primary key).
> >
> > However, some of my queries are complex one-to-many joins with no
> > particular ordering (or no index for quick ordering). These would be much
> > harder to do that trick with, and/or incur extreme amounts of database
> > overhead in running the query hundreds of times (for example, if I were to
> > handle 10,000 rows at a time).
> >
> > Is there any way to get a ResultSet which is actually live streamed, which
> > keeps no knowledge of previous rows or the row count? It seems utterly
> > wasteful to me that I should need to load millions of rows into a Vector
> > (try an ArrayList next time, for minor speedups) when all I want to do is
> > stream the results, not ever needing to know the total number of rows
> ahead
> > of time nor any previous (or subsequent) rows to the "current one" from
> > ResultSet.next().
> >
> > I can't imagine I'm the first person to be using JDBC to access tables
> with
> > tens of millions of rows to attempt to access significant portions of them
> > at a time.
> >
> > Thanks,
> >
> > Doug
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/users-lounge/docs/faq.html
> >
> >

In response to

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Dave Cramer 2002-09-14 00:58:05 Re: Retrieving ResultSets with millions of rows -
Previous Message Dave Cramer 2002-09-14 00:28:43 Re: Retrieving ResultSets with millions of rows -