Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Date: 2017-10-02 21:54:20
Message-ID: 1409.1506981260@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> If this is the only problem then I'd agree we should stick to a spinlock
> (I assume the strings in question can't be very long). I was thinking
> more about what to do if we find other violations that are harder to fix.

I took a quick look through walreceiver.c, and there are a couple of
obvious problems of the same ilk in WalReceiverMain, which is doing this:

walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = GetCurrentTimestamp();

(ie, a potential kernel call) inside a spinlock. But there seems no
real problem with just collecting the timestamp before we enter that
critical section.

I also don't especially like the fact that just above there it reaches
elog(PANIC) with the lock still held, though at least that's a case that
should never happen.

Further down, it's doing a pfree() inside the spinlock, apparently
for no other reason than to save one "if (tmp_conninfo)".

I don't especially like the Asserts inside spinlocks, either. Personally,
I think if those conditions are worth testing then they're worth testing
for real (in production). Variables that are manipulated by multiple
processes are way more likely to assume unexpected states than local
variables.

I'm also rather befuddled by the fact that this code sets and clears
walrcv->latch outside the critical sections. If that field is used
by any other process, surely that's completely unsafe. If it isn't,
why is it being kept in shared memory?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-10-02 21:57:51 Re: Binary search in fmgr_isbuiltin() is a bottleneck.
Previous Message Andres Freund 2017-10-02 21:44:16 Re: Binary search in fmgr_isbuiltin() is a bottleneck.