From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)postgresql(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Subject: | Re: [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM |
Date: | 2017-10-03 14:07:14 |
Message-ID: | 20171003140714.ozcduitwplj3ropj@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Fixed the pstrdup problem by replacing with strlcpy() to stack-allocated
variables (rather than palloc + memcpy as proposed in Michael's patch).
About the other problems:
Tom Lane wrote:
> I took a quick look through walreceiver.c, and there are a couple of
> obvious problems of the same ilk in WalReceiverMain, which is doing this:
>
> walrcv->lastMsgSendTime = walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = GetCurrentTimestamp();
>
> (ie, a potential kernel call) inside a spinlock. But there seems no
> real problem with just collecting the timestamp before we enter that
> critical section.
Done that way.
> I also don't especially like the fact that just above there it reaches
> elog(PANIC) with the lock still held, though at least that's a case that
> should never happen.
Fixed by releasing spinlock just before elog.
> Further down, it's doing a pfree() inside the spinlock, apparently
> for no other reason than to save one "if (tmp_conninfo)".
Fixed.
> I don't especially like the Asserts inside spinlocks, either. Personally,
> I think if those conditions are worth testing then they're worth testing
> for real (in production). Variables that are manipulated by multiple
> processes are way more likely to assume unexpected states than local
> variables.
I didn't change these. It doesn't look to me that these asserts are
worth very much as production code.
> I'm also rather befuddled by the fact that this code sets and clears
> walrcv->latch outside the critical sections. If that field is used
> by any other process, surely that's completely unsafe. If it isn't,
> why is it being kept in shared memory?
I think the latch is only used locally. Seems that it was only put in
shmem to avoid a separate variable ...
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Sokolov Yura | 2017-10-03 14:30:03 | Two pass CheckDeadlock in contentent case |
Previous Message | Tom Lane | 2017-10-03 14:03:25 | Re: [PATCH] Improve geometric types |