Re: PATCH: Batch/pipelining support for libpq

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Matthieu Garrigues <matthieu(dot)garrigues(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Dave Cramer <davecramer(at)postgres(dot)rocks>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Vaishnavi Prabakaran <vaishnaviprabakaran(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Subject: Re: PATCH: Batch/pipelining support for libpq
Date: 2020-11-14 00:37:53
Message-ID: 20201114003753.GA30616@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's a v25.

I made a few more changes to the docs per David's suggestions; I also
reordered the sections quite a bit. It's now like this:
* Batch Mode
* Using Batch Mode
* Issuing Queries
* Processing Results
* Error Handling
* Interleaving Result Processing and Query Dispatch
* Ending Batch Mode
* Functions Associated with Batch Mode
* When to Use Batching

To me as a reader, this makes more sense, but if you disagree, I think
we should discuss further changes. (For example, maybe we should move
the "Functions" section at the end?) The original had "When to Use
Batching" at the start, but it seemed to me that the points it's making
are not as critical as understanding what it is.

I reworked the test program to better fit the TAP model; I found that if
one test mecha failed in whatever way, the connection would be in a
weird state and cause the next test to also fail. I changed so that it
runs one test and exits; then the t/001_libpq_async.pl file (hmm, need
to rename to 001_batch.pl I guess) calls it once for each test.
I adapted the test code to our code style. I also removed the "timings"
stuff; I think that's something better left to pgbench.

(I haven't looked at Daniel's pgbench stuff yet, but I will do that
next.)

While looking at how the tests worked, I gave a hard stare at the new
libpq code and cleaned it up also. There's a lot of minor changes, but
nothing truly substantial. I moved the code around a lot, to keep
things where grouped together they belong.

I'm not 100% clear on things like PGconn->batch_status and how
PGconn->asyncStatus works. Currently everything seems to work
correctly, but I'm worried that because we add new status values to
asyncStatus, some existing code might not be doing everything correctly
(for example when changing to/from ASYNC_BUSY in some cases, are we 100%
we shouldn't be changing to ASYNC_QUEUED?)

While looking this over I noticed a thread from 2014[1] where Matt
Newell tried to implement this stuff and apparently the main review
comment he got before abandoning the patch was that the user would like
a way to access the query that corresponded to each result. The current
patch set does not address that need; the approach to this problem is
that it's on the application's head to keep track of this. Honestly I
don't understand why it would be otherwise ... I'm not sure that it
makes sense to expect that the application is stupid enough that it
doesn't keep track in which order it sent things, but bright enough to
keep pointers to the queries it sent (??). So this seems okay to me.
But added Heikki and Claudio to CC because of that old thread.

[1] https://postgr.es/m/2059418.jZQvL3lH90@obsidian

Attachment Content-Type Size
v25-libpq-batch.patch text/x-diff 85.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-11-14 02:43:58 Re: Misc typos
Previous Message Tomas Vondra 2020-11-13 22:00:56 Re: Zedstore - compressed in-core columnar storage