| From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> | 
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | Neha Sharma <neha(dot)sharma(at)enterprisedb(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: [TRAP: FailedAssertion] causing server to crash | 
| Date: | 2017-08-07 05:29:34 | 
| Message-ID: | CAEepm=0ZsP2HRZ9W-9pPftcwjz1BJZma2nuiFuPhACDMGSiprA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Thu, Aug 3, 2017 at 3:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jul 21, 2017 at 1:31 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Thanks Neha.  It's be best to post the back trace and if possible
>> print oldestXact and ShmemVariableCache->oldestXid from the stack
>> frame for TruncateCLOG.
>>
>> The failing assertion in TruncateCLOG() has a comment that says
>> "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
>> calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
>> *after* it calls TruncateCLOG().  What am I missing here?
>
> This problem was introduced by commit
> ea42cc18c35381f639d45628d792e790ff39e271, so this should be added to
> the PostgreSQL 10 open items list. That commit intended to introduce a
> distinction between (1) the oldest XID that can be safely examined and
> (2) the oldest XID that can't yet be safely reused.  These are the
> same except when we're in the middle of truncating CLOG: (1) advances
> before the truncation, and (2) advances afterwards. That's why
> AdvanceOldestClogXid() happens before truncation proper and
> SetTransactionIdLimit() happens afterwards, and changing the order
> would, I think, be quite wrong.
Added to open items.
> AFAICS, that assertion is simply a holdover from an earlier version of
> the patch that escaped review.  There's just no reason to suppose that
> it's true.
Craig, are you planning to post a patch to remove the assertion, or
make it hold?
>> What actually prevents ShmemVariableCache->oldestXid from going
>> backwards anyway?  Suppose there are two or more autovacuum processes
>> that reach vac_truncate_clog() concurrently.  They do a scan of
>> pg_database whose tuples they access without locking through a
>> pointer-to-volatile because they expect concurrent in-place writers,
>> come up with a value for frozenXID, and then arrive at
>> SetTransactionIdLimit() in whatever order and clobber
>> ShmemVariableCache->oldestXid.  What am I missing here?
>
> Hmm, there could be a bug there, but I don't think it's *this* bug.
I'm not sure that it wrong per se, as long as no one asserts that the
number can't go backwards...
-- 
Thomas Munro
http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Khandekar | 2017-08-07 05:37:50 | Re: expanding inheritance in partition bound order | 
| Previous Message | Masahiko Sawada | 2017-08-07 04:44:27 | Re: Small code improvement for btree |