Skip site navigation (1) Skip section navigation (2)

Re: [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [COMMITTERS] pgsql: Use asynchronous connect API in libpqwalreceiver
Date: 2017-03-15 16:55:49
Message-ID: 7295.1489596949@sss.pgh.pa.us (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-committerspgsql-hackers
Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> writes:
> On 03/03/2017 11:11 PM, Tom Lane wrote:
>> Yeah, I was wondering if this is just exposing a pre-existing bug.
>> However, the "normal" path operates by repeatedly invoking PQconnectPoll
>> (cf. connectDBComplete) so it's not immediately obvious how such a bug
>> would've escaped detection.

> (After a long period of fruitless empirical testing I turned to the code)
> Maybe I'm missing something, but connectDBComplete() handles a return of
> PGRESS_POLLING_OK as a success while connectDBStart() seems not to. I
> don't find anywhere in our code other than libpqwalreceiver that
> actually uses that interface, so it's not surprising if it's now
> failing. So my bet is it is indeed a long-standing bug.

Meh ... that argument doesn't hold water, because the old code here called
PQconnectdbParams which is just PQconnectStartParams then
connectDBComplete.  So the problem cannot be in connectDBStart; that's
common to both paths.  It has to be some discrepancy between what
connectDBComplete does and what the new loop in libpqwalreceiver is doing.

The original loop coding in 1e8a85009 was not very close to the documented
spec for PQconnectPoll at all, and while e434ad39a made it closer, it's
still not really the same: connectDBComplete doesn't call PQconnectPoll
until the socket is known read-ready or write-ready.  The walreceiver loop
does not guarantee that, but would make an additional call after any
random other wakeup.  It's not very clear why bowerbird, and only
bowerbird, would be seeing such wakeups --- but I'm having a really hard
time seeing any other explanation for the change in behavior.  (I wonder
whether bowerbird is telling us that WaitLatchOrSocket can sometimes
return prematurely on Windows.)

I'm also pretty sure that the ResetLatch call is in the wrong place which
could lead to missed wakeups, though that's the opposite of the immediate
problem.

I'll try correcting these things and we'll see if it gets any better.

			regards, tom lane


In response to

Responses

pgsql-hackers by date

Next:From: Robert HaasDate: 2017-03-15 16:58:56
Subject: Re: background sessions
Previous:From: Emre HasegeliDate: 2017-03-15 16:51:23
Subject: Re: Parallel Bitmap scans a bit broken

pgsql-committers by date

Next:From: Tom LaneDate: 2017-03-15 17:26:31
Subject: pgsql: Rewrite async-connection loop in libpqwalreceiver.c, once again.
Previous:From: Robert HaasDate: 2017-03-15 16:47:01
Subject: pgsql: Fix failure to use clamp_row_est() for parallel joins.

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group