Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, pryzby(at)telsasoft(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"
Date: 2018-07-18 14:33:01
Message-ID: 10690.1531924381@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Hello. I confirmed that this patch fixes the crash.

Thanks for double-checking.

> At Tue, 17 Jul 2018 20:01:05 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <14892(dot)1531872065(at)sss(dot)pgh(dot)pa(dot)us>
>> Further investigation showed that the part of that code that was
>> actually needed was not the creation of a child ResourceOwner, but rather
>> restoration of the old CurrentResourceOwner setting after exiting the
>> logical decoding loop. Apparently we can come out of that with the
>> TopTransaction resowner being active.

> CurrentResourceOwner doesn't seem to be changed. It is just saved
> during snapshot export and used just as a flag only for checking
> for duplicate snapshot exporting.

I tried to remove "CurrentResourceOwner = old_resowner" from
logicalfuncs.c, and got a failure in the contrib/test_decoding test.
I investigated the reason and found that it was because
CurrentResourceOwner was equal to TopTransactionResourceOwner when we
come out of the decoding loop. If we don't restore it then subsequent
executor activity gets logged in the wrong resowner.

It is true that the one in slotfuncs.c seems to be unnecessary,
but I thought it best to leave it there; it's cheap, and maybe
there's a problem in cases not exercised in the regression tests.

> So couldn't we use TopTransactionResourceOwner instead of
> AuxProcessResrouceOwner? I feel a bit uneasy that bootstrap and
> standalone-backend have *AuxProcess*ResourceOwner.

Since the aux processes aren't running transactions, I didn't think
that TopTransactionResourceOwner was appropriate. There's also
a problem for bootstrap and standalone backend cases: those do run
transactions and therefore create/destroy TopTransactionResourceOwner,
leaving nothing behind for ShutdownXLOG to use if it tries to use
that. We need an extra resowner somewhere.

It's true that if you adopt a narrow definition of "aux process"
as being "one that goes through AuxiliaryProcessMain", calling
this extra resowner AuxProcessResourceOwner is a bit of a misnomer.
I couldn't think of something better to call it, though. With a
slightly wider understanding of "auxiliary process" as being anything
that's not the postmaster or a client-session backend, it's fine.

> It's not about this ptch, but while looking this closer, I found
> the following comment on ShutdownXLOG().
> | /*
> | * This must be called ONCE during postmaster or standalone-backend shutdown
> | */
> Is the "postmaster" typo of "bootstrap process"? It is also
> called from checkpointer when non-standlone mode.

Well, it's actually executed by the checkpointer these days, but
that's an implementation detail. I think this comment is fine:
at some point while the postmaster is shutting down, this should
be done, and done just once.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mai Peng 2018-07-18 14:35:10 Re: Segfault logical replication PG 10.4
Previous Message Tomas Vondra 2018-07-18 14:08:37 Re: [HACKERS] logical decoding of two-phase transactions