Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Date: 2017-10-03 14:07:14
Message-ID: 20171003140714.ozcduitwplj3ropj@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fixed the pstrdup problem by replacing with strlcpy() to stack-allocated
variables (rather than palloc + memcpy as proposed in Michael's patch).

About the other problems:

Tom Lane wrote:

> I took a quick look through walreceiver.c, and there are a couple of
> obvious problems of the same ilk in WalReceiverMain, which is doing this:
>
> walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = GetCurrentTimestamp();
>
> (ie, a potential kernel call) inside a spinlock. But there seems no
> real problem with just collecting the timestamp before we enter that
> critical section.

Done that way.

> I also don't especially like the fact that just above there it reaches
> elog(PANIC) with the lock still held, though at least that's a case that
> should never happen.

Fixed by releasing spinlock just before elog.

> Further down, it's doing a pfree() inside the spinlock, apparently
> for no other reason than to save one "if (tmp_conninfo)".

Fixed.

> I don't especially like the Asserts inside spinlocks, either. Personally,
> I think if those conditions are worth testing then they're worth testing
> for real (in production). Variables that are manipulated by multiple
> processes are way more likely to assume unexpected states than local
> variables.

I didn't change these. It doesn't look to me that these asserts are
worth very much as production code.

> I'm also rather befuddled by the fact that this code sets and clears
> walrcv->latch outside the critical sections. If that field is used
> by any other process, surely that's completely unsafe. If it isn't,
> why is it being kept in shared memory?

I think the latch is only used locally. Seems that it was only put in
shmem to avoid a separate variable ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sokolov Yura 2017-10-03 14:30:03 Two pass CheckDeadlock in contentent case
Previous Message Tom Lane 2017-10-03 14:03:25 Re: [PATCH] Improve geometric types