Re: [patch] libpq one-row-at-a-time API

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [patch] libpq one-row-at-a-time API
Date: 2012-08-02 21:44:11
Message-ID: CACMqXC+QKTjesX=OMX4ZFznAzOvj6RnzpG3tVnY50y932=Ziqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 3, 2012 at 12:01 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marko Kreen <markokr(at)gmail(dot)com> writes:
>> On Thu, Aug 2, 2012 at 8:19 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> In principle I suppose we could hack PQconsumeInput enough that it
>>> didn't damage the row buffer while still meeting its API contract of
>>> clearing the read-ready condition on the socket; but it wouldn't be
>>> simple or cheap to do so.
>
>> Any code using single-row-mode is new. Any code calling consumeInput
>> before fully draining rows from buffer is buggy, as it allows unlimited growth
>> of network buffer, which breaks whole reason to use single-row mode.
>
> Sorry, that argument will not fly. The API specification for
> PQconsumeInput is that it can be called at any time to drain the kernel
> input buffer, without changing the logical state of the PGconn. It's
> not acceptable to insert a parenthetical "oh, but you'd better not try
> it in single-row mode" caveat.

Well, if the old API contract must be kept, then so be it.

> The reason I find this of importance is that PQconsumeInput is meant to
> be used in an application's main event loop for the sole purpose of
> clearing the read-ready condition on the connection's socket, so that
> it can wait for other conditions. This could very well be decoupled
> from the logic that is involved in testing PQisBusy and
> fetching/processing actual results. (If we had not intended to allow
> those activities to be decoupled, we'd not have separated PQconsumeInput
> and PQisBusy in the first place.) Introducing a dependency between
> these things could lead to non-reproducible, timing-dependent, very
> hard-to-find bugs.
>
> By the same token, if somebody were trying to put single-row-mode
> support into an existing async application, they'd be fooling with the
> parts that issue new queries and collect results, but probably not with
> the main event loop. Thus, I do not believe your argument above that
> "any code using single-row mode is new". The whole app is not going to
> be new. It could be very hard to guarantee that there is no path of
> control that invokes PQconsumeInput before the new data is actually
> processed, because there was never before a reason to avoid extra
> PQconsumeInput calls.

I've thought about this. On first glance indeed, if async app
has both reader and writer registered in event loop, it might be
simpler to keep reading from source socket, event if writer stalls.

But this is exactly the situation where error from consumeInput would help.
Imagine fast source and slow target (common scenario - a database query
for each row). Reading from source *must* be stopped to get
predictable memory usage,.

> As I said, this is the exact same objection I had to the original scheme
> of exposing the row buffer directly. Putting a libpq function in front
> of the row buffer doesn't solve the problem if that function is
> implemented in a way that's still vulnerable to misuse or race conditions.
>
>> And now libpq allows such apps to go from 2x row size to full resultset
>> size with unfortunate coding. Why?
>
> This argument is completely nonsensical. The contract for
> PQconsumeInput has always been that it consumes whatever the kernel has
> available. If you don't extract data from the library before calling it
> again, the library's internal buffer may grow to more than the minimum
> workable size, but that's a tradeoff you may be willing to make to
> simplify your application logic. I do not think that it's an
> improvement to change the API spec to say either that you can't call
> PQconsumeInput in certain states, or that you can but it won't absorb
> data from the kernel.

Old patch had a nice property that we could replace PQgetResult()
with something else. To allow that and also PQconsumeInput(),
we could store offsets in rowBuf, and then skip realign in PQconsumeInput().

Not sure if the replacement reason is worth keeping, but the
offsets may be useful even now as they might give additional
speedup as they decrease the per-column storage
from 16 to 8 bytes on 64-bit architectures.

May be better left for 9.3 though...

--
marko

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-08-03 03:09:51 Re: WIP pgindent replacement
Previous Message Tom Lane 2012-08-02 21:01:07 Re: [patch] libpq one-row-at-a-time API