Re: [TRAP: FailedAssertion] causing server to crash

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Neha Sharma <neha(dot)sharma(at)enterprisedb(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [TRAP: FailedAssertion] causing server to crash
Date: 2017-08-02 15:03:15
Message-ID: CA+TgmoYSg+1PtN_wpqPD8f8Xfepvkf_5PPewc9DRuYGipBWiLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 21, 2017 at 1:31 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Thanks Neha. It's be best to post the back trace and if possible
> print oldestXact and ShmemVariableCache->oldestXid from the stack
> frame for TruncateCLOG.
>
> The failing assertion in TruncateCLOG() has a comment that says
> "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
> calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
> *after* it calls TruncateCLOG(). What am I missing here?

This problem was introduced by commit
ea42cc18c35381f639d45628d792e790ff39e271, so this should be added to
the PostgreSQL 10 open items list. That commit intended to introduce a
distinction between (1) the oldest XID that can be safely examined and
(2) the oldest XID that can't yet be safely reused. These are the
same except when we're in the middle of truncating CLOG: (1) advances
before the truncation, and (2) advances afterwards. That's why
AdvanceOldestClogXid() happens before truncation proper and
SetTransactionIdLimit() happens afterwards, and changing the order
would, I think, be quite wrong.

AFAICS, that assertion is simply a holdover from an earlier version of
the patch that escaped review. There's just no reason to suppose that
it's true.

> What actually prevents ShmemVariableCache->oldestXid from going
> backwards anyway? Suppose there are two or more autovacuum processes
> that reach vac_truncate_clog() concurrently. They do a scan of
> pg_database whose tuples they access without locking through a
> pointer-to-volatile because they expect concurrent in-place writers,
> come up with a value for frozenXID, and then arrive at
> SetTransactionIdLimit() in whatever order and clobber
> ShmemVariableCache->oldestXid. What am I missing here?

Hmm, there could be a bug there, but I don't think it's *this* bug.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-02 15:17:23 Re: Macros bundling RELKIND_* conditions
Previous Message Peter Eisentraut 2017-08-02 14:49:44 Re: typo for using "OBJECT_TYPE" for "security label on domain" in "gram.y"