Re: [TRAP: FailedAssertion] causing server to crash

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: neha(dot)sharma(at)enterprisedb(dot)com
Cc: thomas(dot)munro(at)enterprisedb(dot)com, craig(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [TRAP: FailedAssertion] causing server to crash
Date: 2017-07-21 07:17:29
Message-ID: 20170721.161729.140149762.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 21 Jul 2017 11:39:38 +0530, Neha Sharma <neha(dot)sharma(at)enterprisedb(dot)com> wrote in <CANiYTQuZm+hDvuHB14d65SkL2ko98ESR3Jf2kUiX=m1haL=xrg(at)mail(dot)gmail(dot)com>
> Here is the back trace from the core dump attached.
>
> (gdb) bt
> #0 0x00007f4a71424495 in raise () from /lib64/libc.so.6
> #1 0x00007f4a71425c75 in abort () from /lib64/libc.so.6
> #2 0x00000000009dc18a in ExceptionalCondition (conditionName=0xa905d0
> "!(TransactionIdPrecedesOrEquals(oldestXact,
> ShmemVariableCache->oldestXid))",
> errorType=0xa9044f "FailedAssertion", fileName=0xa90448 "clog.c",
> lineNumber=683) at assert.c:54
> #3 0x0000000000524215 in TruncateCLOG (oldestXact=150036635,
> oldestxid_datoid=13164) at clog.c:682

In vac_truncate_clog, TruncateCLOG is called before
SetTransactionIdLimit, which advances
ShmemVariableCache->oldestXid. Given that the assertion in
TruncateCLOG is valid, they should be called in reverse order. I
suppose that CLOG files can be safely truncated after advancing
XID limits.

By the way, the attached patch is made by "git diff --patience".

filterdiff converts it into somewhat wrong shape. Specifically,
the result is missing the addition part of the difference, as the
second attached patch. I'm not sure which of git(2.9.2) or
filterdiff (0.3.3), (or me?) is doing wrong..

> #4 0x00000000006a6be8 in vac_truncate_clog (frozenXID=150036635,
> minMulti=1, lastSaneFrozenXid=200562449, lastSaneMinMulti=1) at
> vacuum.c:1197
> #5 0x00000000006a6948 in vac_update_datfrozenxid () at vacuum.c:1063
> #6 0x00000000007ce0a2 in do_autovacuum () at autovacuum.c:2625
> #7 0x00000000007cc987 in AutoVacWorkerMain (argc=0, argv=0x0) at
> autovacuum.c:1715
> #8 0x00000000007cc562 in StartAutoVacWorker () at autovacuum.c:1512
> #9 0x00000000007e2acd in StartAutovacuumWorker () at postmaster.c:5414
> #10 0x00000000007e257e in sigusr1_handler (postgres_signal_arg=10) at
> postmaster.c:5111
> #11 <signal handler called>
> #12 0x00007f4a714d3603 in __select_nocancel () from /lib64/libc.so.6
> #13 0x00000000007dde88 in ServerLoop () at postmaster.c:1717
> #14 0x00000000007dd67d in PostmasterMain (argc=3, argv=0x2eb8b00) at
> postmaster.c:1361
> #15 0x000000000071a218 in main (argc=3, argv=0x2eb8b00) at main.c:228
> (gdb) print ShmemVariableCache->oldestXid
> $3 = 548
>
>
> Regards,
> Neha Sharma
>
> On Fri, Jul 21, 2017 at 11:01 AM, Thomas Munro <
> thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>
> > On Fri, Jul 21, 2017 at 4:16 PM, Neha Sharma
> > <neha(dot)sharma(at)enterprisedb(dot)com> wrote:
> > >
> > > Attached is the core dump file received on PG 10beta2 version.
> >
> > Thanks Neha. It's be best to post the back trace and if possible
> > print oldestXact and ShmemVariableCache->oldestXid from the stack
> > frame for TruncateCLOG.
> >
> > The failing assertion in TruncateCLOG() has a comment that says
> > "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
> > calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
> > *after* it calls TruncateCLOG(). What am I missing here?
> >
> > What actually prevents ShmemVariableCache->oldestXid from going
> > backwards anyway? Suppose there are two or more autovacuum processes
> > that reach vac_truncate_clog() concurrently. They do a scan of
> > pg_database whose tuples they access without locking through a
> > pointer-to-volatile because they expect concurrent in-place writers,
> > come up with a value for frozenXID, and then arrive at
> > SetTransactionIdLimit() in whatever order and clobber
> > ShmemVariableCache->oldestXid. What am I missing here?
> >
> > --
> > Thomas Munro
> > http://www.enterprisedb.com
> >

--
堀口恭太郎

日本電信電話株式会社 NTTオープンソースソフトウェアセンタ
Phone: 03-5860-5115 / Fax: 03-5463-5490

Attachment Content-Type Size
truncateCLOG_after_advancing_xid_limits.patch text/x-patch 1.1 KB
BROKEN_truncateCLOG_after_advancing_xid_limits.patch text/x-patch 658 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-07-21 07:36:17 Re: PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING
Previous Message Sokolov Yura 2017-07-21 06:54:15 Re: autovacuum can't keep up, bloat just continues to rise