Re: Streaming ResultSet rows instead of buffering them in memory

From: Dave Cramer <davecramer(at)postgres(dot)rocks>
To: Mikko Tiihonen <mikko(dot)tiihonen(at)nitor(dot)com>
Cc: "pgsql-jdbc(at)lists(dot)postgresql(dot)org" <pgsql-jdbc(at)lists(dot)postgresql(dot)org>
Subject: Re: Streaming ResultSet rows instead of buffering them in memory
Date: 2021-01-12 13:51:52
Message-ID: CADK3HHJzr85hKSXnp5y6mqs-NgiSaYtdXL3F+OmgfUdypJvM9w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hi Mikko,

Yes to both. Let me see what I can do

Dave Cramer
www.postgres.rocks

On Sat, 9 Jan 2021 at 12:10, Mikko Tiihonen <mikko(dot)tiihonen(at)nitor(dot)com>
wrote:

> I have now finished the PR https://github.com/pgjdbc/pgjdbc/pull/1735 and
> fixed all the comments from the earlier discussions.
>
> Would you be open to merging this at some point? At least a new review
> would be appreciated.
>
> -Mikko
>
> ------------------------------
> *From:* Mikko Tiihonen <mikko(dot)tiihonen(at)nitor(dot)com>
> *Sent:* 15 March 2020 14:57
> *To:* pgsql-jdbc(at)lists(dot)postgresql(dot)org <pgsql-jdbc(at)lists(dot)postgresql(dot)org>
> *Subject:* Streaming ResultSet rows instead of buffering them in memory
>
> Hi,
>
> I'd like to propose that the pgjdbc is modified to avoid buffering of
> ResultSets to memory by default, whenever it is safe to do so.
> I created a WIP PR that works for many basic scenarios
> https://github.com/pgjdbc/pgjdbc/pull/1735
>
> I tried to do a very minimal change that should still be maintainable in
> the long term. Basically I just moved the protocol parsing variables into a
> state object and drag it along when processing next row of ResultSet. The
> effect on applications is that queries have an effective fetch size of 1,
> without any of the downsides of such setting.
>
> The motivation for the patch is to speed up queries and lower memory
> usage. The queries become faster indirectly since the query processing is
> parallelized. Now we do in sequence:
> 1) backend returns results and jdbc driver parses and buffers result set
> (which can take a while)
> 2) application consumes result set
>
> With this patch the steps occur concurrently
> 1) backend returns results
> 2) application consumes result set with the jdbc driver parsing rows from
> tcp buffer on demand
>
> The cursors used by enabling a fetch size do a similar thing, but
> - it tells the backend to optimize the query plan for fetching only the
> first few rows instead of the full result set
> - each batch of rows require a round-trip between backend and application
> during which neither do any useful work
> - small fetch size causes many round-trips while large fetch size uses
> memory and increases latency
>
> The bad thing is that if the application is not consuming the result set
> fast enough the backend cannot release its locks as fast. But this only
> effects large results that do not fit entirely into the tcp buffers.
>
> The WIP code sometimes fails if the ResultSet is not properly closed
> before the next statement is executed. For many tests this happens in the
> tearDown phase. I also think that currently things might not work correctly
> if application triggers queries on the connection while processing the
> results, for example by requesting metadata. In that case we have no other
> option than to go back to buffering to clear the connection for reuse.
>
> -Mikko
>
>

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Brett Okken 2021-01-13 13:21:55 [pgjdbc/pgjdbc] ff6c82: fix: potential overflow when reading cache sizes (...
Previous Message Sehrope Sarkuni 2021-01-12 13:34:06 [pgjdbc/pgjdbc] df4b6a: test: Add tests for broken PGCopyInputStream.readF...