Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Subject: Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Date: 2017-10-03 17:44:12
Message-ID: 23454.1507052652@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
>> So that's trouble waiting to happen, for sure. At the very least we
>> need to do a single fetch of WalRcv->latch, not two. I wonder whether
>> even that is sufficient, though: this coding requires an atomic fetch of
>> a pointer, which is something we generally don't assume to be safe.

BTW, I had supposed that this bug was of long standing, but actually it's
new in v10, dating to 597a87ccc9a6fa8af7f3cf280b1e24e41807d555. Before
that walreceiver start/stop just changed the owner of a long-lived shared
latch, and there was no question of stale pointers.

I considered reverting that decision, but the reason for it seems to have
been to let libpqwalreceiver.c manipulate MyProc->procLatch rather than
having to know about a custom latch. That's probably a sufficient reason
to justify some squishiness in the wakeup logic. Still, we might want to
revisit it if we find any other problems here.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2017-10-03 17:55:36 Add TOAST to system tables with ACL?
Previous Message Adrien Nayrat 2017-10-03 17:43:15 Re: Possible SSL improvements for a newcomer to tackle