Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: r(dot)zharkov(at)postgrespro(dot)ru, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed
Date: 2019-04-06 17:10:25
Message-ID: 20190406171025.x7mbhp6kct75oqny@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2019-04-06 09:28:46 -0700, Andres Freund wrote:
> On 2019-04-06 12:23:06 -0400, Tom Lane wrote:
> > It seems that there may be some connection between this problem and
> > EPQ. I was working on committing Amit's fix for bug #15677, which
> > demonstrated that EPQ doesn't work for partitioned-table target rels.
> > It seemed like there really needed to be regression test coverage for
> > that, so I tried to convert his crasher example into an isolation test.
> > It does indeed crash without Amit's fix ... but with it, lookee what
> > I get:
> >
> > +error in steps c1 complexpartupdate: ERROR: unexpected table_lock_tuple status: 1
> >
> > That seems fully reproducible in this test. I haven't looked into
> > exactly what's causing that, but now that we have a reproducible
> > example, somebody should.
> >
> > I'm not quite sure if I should commit this as-is or wait till the
> > other problem is fixed. A crash is probably worse than a bogus
> > error, but I don't like committing obviously-wrong "expected" output.
> > Thoughts?
>
> Let me have a look at the testcase - I'd been running Roman's testcase
> for quite a few hours without being able to reproduce. But your testcase
> seems to trigger this reliably, so I hope I can make some quick
> progress.

Hm. I see what's wrong here - the new code assumed that we couldn't get
a SelfModified because the first version of the to-be-(deleted|updated)
tuple was visible. To properly discern that from the TM_Deleted case,
I'd to change/fix heapam_lock_tuple's follow-the-update chain to return
SelfModified, rather than Invisible in this case (I don't think we want
to allow invisible - we'd have to have waited for the earlier tuple
version) - which is a more accurate return code anyway.

I'm still not understanding how that'd be possible in Roman's
case. Given the workload there never should be any self updating going
on?

Heavily-WIP patch attached.

I noticed that we say
+ ereport(ERROR,
+ (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
+ errmsg("tuple to be updated was already modified by an operation triggered by the current command"),

in the ExecDelete() case (that's not new). Which seems odd.

I think my fix would need a non-partition reproducer. I'll work on that
and polishing it after having a coffee.

Greetings,

Andres Freund

Attachment Content-Type Size
fix-repeated-self-mod-chain.diff text/x-diff 3.1 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2019-04-06 17:17:05 Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed
Previous Message r.zharkov 2019-04-06 17:09:15 Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed