Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>
Subject: Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Date: 2017-10-03 18:32:22
Message-ID: 4689c67f-5ffb-7359-a3e9-4abf3908e603@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/10/17 19:44, Tom Lane wrote:
> I wrote:
>>> So that's trouble waiting to happen, for sure. At the very least we
>>> need to do a single fetch of WalRcv->latch, not two. I wonder whether
>>> even that is sufficient, though: this coding requires an atomic fetch of
>>> a pointer, which is something we generally don't assume to be safe.
>
> BTW, I had supposed that this bug was of long standing, but actually it's
> new in v10, dating to 597a87ccc9a6fa8af7f3cf280b1e24e41807d555. Before
> that walreceiver start/stop just changed the owner of a long-lived shared
> latch, and there was no question of stale pointers.
>
> I considered reverting that decision, but the reason for it seems to have
> been to let libpqwalreceiver.c manipulate MyProc->procLatch rather than
> having to know about a custom latch. That's probably a sufficient reason
> to justify some squishiness in the wakeup logic. Still, we might want to
> revisit it if we find any other problems here.
That's correct, and because the other users of that library don't have
special latch it seemed feasible to standardize on the procLatch. If we
indeed wanted to change the decision about this I think we can simply
give latch as a parameter to libpqrcv_connect and store it in the
WalReceiverConn struct. The connection does not need to live past the
latch (although it currently does, but that's just a matter of
reordering the code in WalRcvDie() a little AFAICS).

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-10-03 19:00:33 datetime.h defines like PM conflict with external libraries
Previous Message Alexander Korotkov 2017-10-03 18:27:16 Re: [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel