Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
Date: 2018-07-16 14:39:26
Message-ID: 20436.1531751966@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2018-07-15 18:48:43 -0400, Tom Lane wrote:
>> So basically, WAL replay hits an error while holding a buffer pin, and
>> nothing is done to release the buffer pin, but AtProcExit_Buffers thinks
>> something should have been done.

> I think there's a few other cases where we hit this. I've seen something
> similar from inside checkpointer / BufferSync(). I'd be surprised if
> bgwriter couldn't be triggered into the same.

Hm, yeah, on reflection it's pretty obvious that those are hazard cases.

> I'm pretty sure that we do *not* force a panic on all nonzero-exit-code
> cases for other subprocesses.

That's my recollection as well -- mostly, we just start a new one.

So I said I didn't want to do extra work on this, but I am looking into
fixing it by having these aux process types run a ResourceOwner that can
be told to clean up any open buffer pins at exit. We could be sure the
coverage is complete by dint of removing the special-case code in
resowner.c that allows buffer pins to be taken with no active resowner.
Then CheckForBufferLeaks can be left as-is, ie something we do only
in assert builds.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2018-07-16 14:42:23 Re: Libpq support to connect to standby server as priority
Previous Message Robert Haas 2018-07-16 14:38:14 Re: patch to allow disable of WAL recycling