From: | Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com> |
---|---|
To: | Boris Kolpackov <boris(at)codesynthesis(dot)com> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Pipeline mode and PQpipelineSync() |
Date: | 2021-06-22 22:14:52 |
Message-ID: | 202106222214.ptjfmstb23mu@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2021-Jun-21, Boris Kolpackov wrote:
> Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com> writes:
>
> > I think I should rephrase this to say that PQpipelineSync() is needed
> > where the user needs the server to start executing commands; and
> > separately indicate that it is possible (but not promised) that the
> > server would start executing commands ahead of time because $reasons.
>
> I think always requiring PQpipelineSync() is fine since it also serves
> as an error recovery boundary. But the fact that the server waits until
> the sync message to start executing the pipeline is surprising. To me
> this seems to go contrary to the idea of a "pipeline".
But does that actually happen? There's a very easy test we can do by
sending queries that sleep. If my libpq program sends a "SELECT
pg_sleep(2)", then PQflush(), then sleep in the client program two more
seconds without sending the sync; and *then* send the sync, I find that
the program takes 2 seconds, not four. This shows that both client and
server slept in parallel, even though I didn't send the Sync until after
the client was done sleeping.
In order to see this, I patched libpq_pipeline.c with the attached, and
ran it under time:
time ./libpq_pipeline simple_pipeline -t simple.trace
simple pipeline... sent and flushed the sleep. Sleeping 2s here:
client sleep done
ok
real 0m2,008s
user 0m0,000s
sys 0m0,003s
So I see things happening as you describe in (1):
> In fact, I see the following ways the server could behave:
>
> 1. The server starts executing queries and sending their results before
> receiving the sync message.
I am completely at a loss on how to explain a server that behaves in any
other way, given how the protocol is designed. There is no buffering on
the server side.
> While it can be tempting to say that this is an implementation detail,
> this affects the way one writes a client. For example, I currently have
> the following comment in my code:
>
> // Send queries until we get blocked. This feels like a better
> // overall strategy to keep the server busy compared to sending one
> // query at a time and then re-checking if there is anything to read
> // because the results of INSERT/UPDATE/DELETE are presumably small
> // and quite a few of them can get buffered before the server gets
> // blocked.
>
> This would be a good strategy for behavior (1) but not (3) (where it
> would make more sense to queue the queries on the client side).
Agreed, that's the kind of strategy I would have thought was the most
reasonable, given my understanding of how the protocol works.
I wonder if your program is being affected by something else. Maybe the
socket is nonblocking (though I don't quite understand how that would
affect the client behavior in just this way), or your program is
buffering elsewhere. I don't do C++ much so I can't help you with that.
> So I think it would be useful to clarify the server behavior and
> specify it in the documentation.
I'll see about improving the docs on these points.
> > Do I have it right that other than this documentation problem, you've
> > been able to use pipeline mode successfully?
>
> So far I've only tried it in a simple prototype (single INSERT statement).
> But I am busy plugging it into ODB's bulk operation support (that we
> already have for Oracle and MSSQL) and once that's done I should be
> able to exercise things in more meaningful ways.
Fair enough.
--
Álvaro Herrera 39°49'30"S 73°17'W
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2021-06-22 22:37:06 | Re: disfavoring unparameterized nested loops |
Previous Message | Mike | 2021-06-22 22:07:53 | Fwd: Emit namespace in post-copy output |