|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>|
|Cc:||r(dot)zharkov(at)postgrespro(dot)ru, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org|
|Subject:||Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2019-04-06 09:28:46 -0700, Andres Freund wrote:
> On 2019-04-06 12:23:06 -0400, Tom Lane wrote:
> > It seems that there may be some connection between this problem and
> > EPQ. I was working on committing Amit's fix for bug #15677, which
> > demonstrated that EPQ doesn't work for partitioned-table target rels.
> > It seemed like there really needed to be regression test coverage for
> > that, so I tried to convert his crasher example into an isolation test.
> > It does indeed crash without Amit's fix ... but with it, lookee what
> > I get:
> > +error in steps c1 complexpartupdate: ERROR: unexpected table_lock_tuple status: 1
> > That seems fully reproducible in this test. I haven't looked into
> > exactly what's causing that, but now that we have a reproducible
> > example, somebody should.
> > I'm not quite sure if I should commit this as-is or wait till the
> > other problem is fixed. A crash is probably worse than a bogus
> > error, but I don't like committing obviously-wrong "expected" output.
> > Thoughts?
> Let me have a look at the testcase - I'd been running Roman's testcase
> for quite a few hours without being able to reproduce. But your testcase
> seems to trigger this reliably, so I hope I can make some quick
Hm. I see what's wrong here - the new code assumed that we couldn't get
a SelfModified because the first version of the to-be-(deleted|updated)
tuple was visible. To properly discern that from the TM_Deleted case,
I'd to change/fix heapam_lock_tuple's follow-the-update chain to return
SelfModified, rather than Invisible in this case (I don't think we want
to allow invisible - we'd have to have waited for the earlier tuple
version) - which is a more accurate return code anyway.
I'm still not understanding how that'd be possible in Roman's
case. Given the workload there never should be any self updating going
Heavily-WIP patch attached.
I noticed that we say
+ errmsg("tuple to be updated was already modified by an operation triggered by the current command"),
in the ExecDelete() case (that's not new). Which seems odd.
I think my fix would need a non-partition reproducer. I'll work on that
and polishing it after having a coffee.
|Next Message||Andres Freund||2019-04-06 17:17:05||Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed|
|Previous Message||r.zharkov||2019-04-06 17:09:15||Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed|