Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed

From: Andres Freund <andres(at)anarazel(dot)de>
To: r(dot)zharkov(at)postgrespro(dot)ru
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed
Date: 2019-04-06 17:17:05
Message-ID: 20190406171705.dogsasgftooz5rf5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2019-04-07 00:09:15 +0700, r(dot)zharkov(at)postgrespro(dot)ru wrote:
> On 2019-04-06 23:28, Andres Freund wrote:
> > Hi,
> >
> > Let me have a look at the testcase - I'd been running Roman's testcase
> > for quite a few hours without being able to reproduce. But your testcase
> > seems to trigger this reliably, so I hope I can make some quick
> > progress.
> >
> > - Andres
>
> Hello,
> I try to find the bad commit using bisect. But it takes very long
> time.

I'd be very surprised if it weren't

commit 5db6df0c0117ff2a4e0cd87594d2db408cd5022f
Author: Andres Freund <andres(at)anarazel(dot)de>
Date: 2019-03-23 19:55:57 -0700

tableam: Add tuple_{insert, delete, update, lock} and use.

I just sent a fix for the issue Tom just reported, but I don't quite see
how it applies to your case, given that there is - as far as I
understand - only a single statement per transaction, no triggers
including foreign keys, no CTEs etc. But it'd sure be interesting if my
fix changes his error into trigering on TM_SelfModified rather than
TM_Invisible.

I'm kinda wondering if your / Roman's case is exposing a race condition
somewhere (like wrong order of clog / procarray checks or such) that
previously wasn't user visible.

I think we probably should expand the error messages for the unexpected
cases to include the tid of the failed tuple (both original and
followed) - then we could at least look through the heap and WAL to get
more understanding.

> The error reproduces with the default config using 24 clients ( server has
> 24 CPUs )
> pgbench test -j 12 -T 36000 -f ycsb_read_zipf.sql -f ycsb_update_zipf.sql -c
> 24 -P 60
> It does not reproduce when updating the only one record.

I ran it for like 9 hours over night, without triggering the error. On a
computer with fewer CPUs though.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2019-04-06 17:27:08 Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed
Previous Message Andres Freund 2019-04-06 17:10:25 Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed