Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, pryzby(at)telsasoft(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
Date: 2018-07-18 03:42:20
Message-ID: 20180718.124220.08087886.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello. I confirmed that this patch fixes the crash.

At Tue, 17 Jul 2018 20:01:05 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <14892(dot)1531872065(at)sss(dot)pgh(dot)pa(dot)us>
> I wrote:
> >> So I said I didn't want to do extra work on this, but I am looking into
> >> fixing it by having these aux process types run a ResourceOwner that can
> >> be told to clean up any open buffer pins at exit.
>
> > That turned out to be a larger can of worms than I'd anticipated, as it
> > soon emerged that we'd acquired a whole lot of cargo-cult programming
> > around ResourceOwners. ...
> > I'm mostly pretty happy with this, but I think there are a couple of
> > loose ends in logicalfuncs.c and slotfuncs.c: those are creating
> > non-standalone ResourceOwners (children of whatever the active
> > ResourceOwner is) and doing nothing much to clean them up. That seems
> > pretty bogus.
>
> Further investigation showed that the part of that code that was
> actually needed was not the creation of a child ResourceOwner, but rather
> restoration of the old CurrentResourceOwner setting after exiting the
> logical decoding loop. Apparently we can come out of that with the
> TopTransaction resowner being active. This still seems a bit bogus;
> maybe there should be a save-and-restore happening somewhere else?
> But I'm not really excited about doing more than commenting it.

CurrentResourceOwner doesn't seem to be changed. It is just saved
during snapshot export and used just as a flag only for checking
for duplicate snapshot exporting.

> Also, most of the other random creations of ResourceOwners seem to just
> not be necessary at all, even with the new rule that you must have a
> resowner to acquire buffer pins. So the attached revised patch just
> gets rid of them, and improves some misleading/wrong comments on the
> topic. It'd still be easy to use CreateAuxProcessResourceOwner in any
> process where we discover we need one, but I don't see the value in adding
> useless overhead.

+1 for unifying to resowner for auxiliary processes. I found the
comment below just before ending cleanup of auxiliary process
main funcs.

| * These operations are really just a minimal subset of
| * AbortTransaction(). We don't have very many resources to worry
| * about in checkpointer, but we do have LWLocks, buffers, and temp
| * files.

So couldn't we use TopTransactionResourceOwner instead of
AuxProcessResrouceOwner? I feel a bit uneasy that bootstrap and
standalone-backend have *AuxProcess*ResourceOwner.

It's not about this ptch, but while looking this closer, I found
the following comment on ShutdownXLOG().

| /*
| * This must be called ONCE during postmaster or standalone-backend shutdown
| */

Is the "postmaster" typo of "bootstrap process"? It is also
called from checkpointer when non-standlone mode.

> At this point I'm leaning to just applying this in HEAD and calling it
> good. The potential for assertion failures isn't relevant to production
> builds, and my best guess at this point is that there isn't really any
> other user-visible bug. The resowners that were being created and not
> adequately cleaned up all seem to have been dead code anyway (ie they'd
> never acquire any resources).

Agreed.

> We could have a problem if, say, the
> bgwriter exited via the FATAL path while holding a pin, but I don't think
> there's a reason for it to do that except in a database-wide shutdown,
> where the consequences of a leaked pin seem pretty minimal.
>
> Any objections? Anyone want to do further review?

FWIW I think we won't be concerned about leaked pins after FATAL.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-07-18 04:02:35 Re: missing toast table for pg_policy
Previous Message Craig Ringer 2018-07-18 03:12:06 Re: [bug fix] Produce a crash dump before main() on Windows