Re: Pipeline mode and PQpipelineSync()

From: Boris Kolpackov <boris(at)codesynthesis(dot)com>
To: Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Pipeline mode and PQpipelineSync()
Date: 2021-06-23 08:37:22
Message-ID: boris.20210623100839@codesynthesis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvaro(dot)herrera(at)2ndquadrant(dot)com> writes:

> > I think always requiring PQpipelineSync() is fine since it also serves
> > as an error recovery boundary. But the fact that the server waits until
> > the sync message to start executing the pipeline is surprising. To me
> > this seems to go contrary to the idea of a "pipeline".
>
> But does that actually happen? There's a very easy test we can do by
> sending queries that sleep. If my libpq program sends a "SELECT
> pg_sleep(2)", then PQflush(), then sleep in the client program two more
> seconds without sending the sync; and *then* send the sync, I find that
> the program takes 2 seconds, not four. This shows that both client and
> server slept in parallel, even though I didn't send the Sync until after
> the client was done sleeping.

Thanks for looking into it. My experiments were with INSERT and I now
was able to try things with larger pipelines. I can now see the server
starts sending results after ~400 statements. So I think you are right,
the server does start executing the pipeline before receiving the sync
message, though there is still something strange going on (but probably
on the client side):

I have a pipeline of say 500 INSERTs. If I "execute" this pipeline by first
sending all the statements and then reading the results, then everything
works as expected. This is the call sequence I am talking about:

PQsendQueryPrepared() # INSERT #1
PQflush()
PQsendQueryPrepared() # INSERT #2
PQflush()
...
PQsendQueryPrepared() # INSERT #500
PQpipelineSync()
PQflush()
PQconsumeInput()
PQgetResult() # INSERT #1
PQgetResult() # NULL
PQgetResult() # INSERT #2
PQgetResult() # NULL
...
PQgetResult() # INSERT #500
PQgetResult() # NULL
PQgetResult() # PGRES_PIPELINE_SYNC

If, however, I execute it by checking for results before sending the
next INSERT, I get the following call sequence:

PQsendQueryPrepared() # INSERT #1
PQflush()
PQsendQueryPrepared() # INSERT #2
PQflush()
...
PQsendQueryPrepared() # INSERT #~400
PQflush()
PQconsumeInput() # At this point select() indicates we can read.
PQgetResult() # NULL (???)
PQgetResult() # INSERT #1
PQgetResult() # NULL
PQgetResult() # INSERT #2
PQgetResult() # NULL
...

What's strange here is that the first PQgetResult() call (marked with ???)
returns NULL instead of result for INSERT #1 as in the first call sequence.
Interestingly, if I skip it, the rest seems to progress as expected.

Any idea what might be going on here? My hunch is that there is an issue
with libpq's state machine. In particular, in the second case, PQgetResult()
is called before the sync message is sent. Did you have a chance to test
such a scenario (i.e., a large pipeline where the first result is processed
before the PQpipelineSync() call)? Of course, this could very well be a bug
on my side or me misunderstanding something.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-06-23 08:38:43 Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors
Previous Message Michael Paquier 2021-06-23 08:06:18 Re: pgbench logging broken by time logic changes