Re: Pipeline mode and PQpipelineSync()

From: Boris Kolpackov <boris(at)codesynthesis(dot)com>
To: Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Pipeline mode and PQpipelineSync()
Date: 2021-06-21 08:38:20
Message-ID: boris.20210621100918@codesynthesis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com> writes:

> I think I should rephrase this to say that PQpipelineSync() is needed
> where the user needs the server to start executing commands; and
> separately indicate that it is possible (but not promised) that the
> server would start executing commands ahead of time because $reasons.

I think always requiring PQpipelineSync() is fine since it also serves
as an error recovery boundary. But the fact that the server waits until
the sync message to start executing the pipeline is surprising. To me
this seems to go contrary to the idea of a "pipeline".

In fact, I see the following ways the server could behave:

1. The server starts executing queries and sending their results before
receiving the sync message.

2. The server starts executing queries before receiving the sync message
but buffers the results until it receives the sync message.

3. The server buffers the queries and only starts executing them and
sending the results after receiving the sync message.

My observations suggest that the server behaves as (3) but it could
also be (2).

While it can be tempting to say that this is an implementation detail,
this affects the way one writes a client. For example, I currently have
the following comment in my code:

// Send queries until we get blocked. This feels like a better
// overall strategy to keep the server busy compared to sending one
// query at a time and then re-checking if there is anything to read
// because the results of INSERT/UPDATE/DELETE are presumably small
// and quite a few of them can get buffered before the server gets
// blocked.

This would be a good strategy for behavior (1) but not (3) (where it
would make more sense to queue the queries on the client side). So I
think it would be useful to clarify the server behavior and specify
it in the documentation.

> Do I have it right that other than this documentation problem, you've
> been able to use pipeline mode successfully?

So far I've only tried it in a simple prototype (single INSERT statement).
But I am busy plugging it into ODB's bulk operation support (that we
already have for Oracle and MSSQL) and once that's done I should be
able to exercise things in more meaningful ways.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-06-21 09:18:09 Re: Doc patch for Logical Replication Message Formats (PG14)
Previous Message Japin Li 2021-06-21 08:36:12 Re: Fix for segfault in logical replication on master