From: | Erik Rijkers <er(at)xs4all(dot)nl> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org |
Subject: | Re: Race conditions with WAL sender PID lookups |
Date: | 2017-05-21 06:19:34 |
Message-ID: | 98d7b5bc62932d1d9b4a0b89ea735173@xs4all.nl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2017-05-21 06:37, Erik Rijkers wrote:
> On 2017-05-20 14:40, Michael Paquier wrote:
>> On Fri, May 19, 2017 at 3:01 PM, Masahiko Sawada
>> <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> Also, as Horiguchi-san pointed out earlier, walreceiver seems need
>>> the
>>> similar fix.
>>
>> Actually, now that I look at it, ready_to_display should as well be
>> protected by the lock of the WAL receiver, so it is incorrectly placed
>> in walreceiver.h. As you are pointing out, pg_stat_get_wal_receiver()
>> is lazy as well, and that's new in 10, so we have an open item here
>> for both of them. And I am the author for both things. No issues
>> spotted in walreceiverfuncs.c after review.
>>
>> I am adding an open item so as both issues are fixed in PG10. With the
>> WAL sender part, I think that this should be a group shot.
>>
>> So what do you think about the attached?
>
>> [walsnd-pid-races-v3.patch]
>
>
> With this patch on current master my logical replication tests
> (pgbench-over-logical-replication) run without errors for the first
> time in many days (even weeks).
Unfortunately, just now another logical-replication failure occurred.
The same as I have seen all along:
The symptom: after starting logical replication, there are no rows in
pg_stat_replication and in the replica-log logical replication complains
about max_replication_slots being too low. (from previous experience I
know that making max_replication_slots higher does indeed 'help', but
only until the next (same) error occurs, with renewed (same) complaint).
Also from previous experience of this failed state I know that it can be
'cleaned up' by
manually emptying these tables:
delete from pg_subscription_rel;
delete from pg_subscription;
delete from pg_replication_origin;
Then it becomes possible to start a new subscription without the above
symptoms.
I'll do some more testing and hopefully get some information that's less
vague...
Erik Rijkers
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2017-05-21 06:39:12 | Re: proposal psql \gdesc |
Previous Message | Piotr Stefaniak | 2017-05-21 05:36:31 | Re: pgindent (was Re: [COMMITTERS] pgsql: Preventive maintenance in advance of pgindent run.) |