Re: statement_timeout affects query results fetching?

From: Shay Rojansky <roji(at)roji(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: statement_timeout affects query results fetching?
Date: 2015-08-11 11:58:43
Message-ID: CADT4RqDhytLw5Tk+XJ6BwJOyn=-e-RkPZ4tNjWVd+U-f18WH1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks (once again!) for the valuable suggestions Robert.

The idea of chunking/buffering via cursors has been raised before for
another purpose - allowing multiple queries "concurrently" at the API level
(where concurrently means interleaving when reading the resultsets). This
would imply exposing the number of rows fetched to the user like you
suggested. However, I don't think there's a way we can remove the API
option to *not* buffer (as I said before, ADO.NET even provides a standard
API feature for reading column-by-column), and therefore the general
problem remains...

I think the right solution for us at the driver level would be to switch to
driver-enforced timeouts, i.e. to no longer use statement_timeout but look
at socket read times instead. I'll look into doing that for our next
version.

Thanks for all your thoughts!

On Mon, Aug 10, 2015 at 2:30 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Aug 10, 2015 at 5:25 AM, Shay Rojansky <roji(at)roji(dot)org> wrote:
> > Thanks for the explanation Robert, that makes total sense. However, it
> seems
> > like the utility of PG's statement_timeout is much more limited than I
> > thought.
> >
> > In case you're interested, I dug a little further and it seems that
> > Microsoft's client for SQL Server implements the following timeout
> (source):
> >
> > cumulative time-out (for all network packets that are read during the
> > invocation of a method) for all network reads during command execution or
> > processing of the results. A time-out can still occur after the first
> row is
> > returned, and does not include user processing time, only network read
> time.
> >
> > Since it doesn't seem possible to have a clean query-processing-only
> timeout
> > at the backend, we may be better off doing something similar to the above
> > and enforce timeouts on the client only. Any further thoughts on this
> would
> > be appreciated.
>
> An alternative you may want to consider is using the Execute message
> with a non-zero row count and reading all of the returned rows as they
> come back, buffering them in memory. When those have all been
> consumed, issue another Execute message and get some more rows.
>
> AFAICS, the biggest problem with this is that there's no good way to
> bound the number of rows returned by size rather than by number, which
> has been complained about before by somebody else in a situation
> similar to yours. Another problem is that I believe it will cause
> cursor_tuple_fraction to kick in, which may change query plans. But
> it does have the advantage that the query will be suspended from the
> server's point of view, which I *think* will toll statement_timeout.
>
> You might also consider exposing some knobs to the user, so that they
> can set the number of rows fetched in one go, and let that be all the
> rows or only some of them.
>
> We really need a better way of doing this, but I think this is the way
> other drivers are handling it now.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-08-11 13:53:31 Re: Reducing ClogControlLock contention
Previous Message Anastasia Lubennikova 2015-08-11 11:41:22 Re: How to compare different datums within from a tuple?