Re: Pipeline mode and PQpipelineSync()

From: Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com>
To: Boris Kolpackov <boris(at)codesynthesis(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Pipeline mode and PQpipelineSync()
Date: 2021-06-22 22:14:52
Message-ID: 202106222214.ptjfmstb23mu@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Jun-21, Boris Kolpackov wrote:

> Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com> writes:
>
> > I think I should rephrase this to say that PQpipelineSync() is needed
> > where the user needs the server to start executing commands; and
> > separately indicate that it is possible (but not promised) that the
> > server would start executing commands ahead of time because $reasons.
>
> I think always requiring PQpipelineSync() is fine since it also serves
> as an error recovery boundary. But the fact that the server waits until
> the sync message to start executing the pipeline is surprising. To me
> this seems to go contrary to the idea of a "pipeline".

But does that actually happen? There's a very easy test we can do by
sending queries that sleep. If my libpq program sends a "SELECT
pg_sleep(2)", then PQflush(), then sleep in the client program two more
seconds without sending the sync; and *then* send the sync, I find that
the program takes 2 seconds, not four. This shows that both client and
server slept in parallel, even though I didn't send the Sync until after
the client was done sleeping.

In order to see this, I patched libpq_pipeline.c with the attached, and
ran it under time:

time ./libpq_pipeline simple_pipeline -t simple.trace
simple pipeline... sent and flushed the sleep. Sleeping 2s here:
client sleep done
ok

real 0m2,008s
user 0m0,000s
sys 0m0,003s

So I see things happening as you describe in (1):

> In fact, I see the following ways the server could behave:
>
> 1. The server starts executing queries and sending their results before
> receiving the sync message.

I am completely at a loss on how to explain a server that behaves in any
other way, given how the protocol is designed. There is no buffering on
the server side.

> While it can be tempting to say that this is an implementation detail,
> this affects the way one writes a client. For example, I currently have
> the following comment in my code:
>
> // Send queries until we get blocked. This feels like a better
> // overall strategy to keep the server busy compared to sending one
> // query at a time and then re-checking if there is anything to read
> // because the results of INSERT/UPDATE/DELETE are presumably small
> // and quite a few of them can get buffered before the server gets
> // blocked.
>
> This would be a good strategy for behavior (1) but not (3) (where it
> would make more sense to queue the queries on the client side).

Agreed, that's the kind of strategy I would have thought was the most
reasonable, given my understanding of how the protocol works.

I wonder if your program is being affected by something else. Maybe the
socket is nonblocking (though I don't quite understand how that would
affect the client behavior in just this way), or your program is
buffering elsewhere. I don't do C++ much so I can't help you with that.

> So I think it would be useful to clarify the server behavior and
> specify it in the documentation.

I'll see about improving the docs on these points.

> > Do I have it right that other than this documentation problem, you've
> > been able to use pipeline mode successfully?
>
> So far I've only tried it in a simple prototype (single INSERT statement).
> But I am busy plugging it into ODB's bulk operation support (that we
> already have for Oracle and MSSQL) and once that's done I should be
> able to exercise things in more meaningful ways.

Fair enough.

--
Álvaro Herrera 39°49'30"S 73°17'W

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-06-22 22:37:06 Re: disfavoring unparameterized nested loops
Previous Message Mike 2021-06-22 22:07:53 Fwd: Emit namespace in post-copy output