Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed

From: r(dot)zharkov(at)postgrespro(dot)ru
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed
Date: 2019-04-02 17:04:15
Message-ID: b5d89910ec01a657530d1a49ff0e05dc@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2019-04-02 23:13, Andres Freund wrote:
> Hi,
>
> On 2019-04-02 12:51:08 -0300, Alvaro Herrera wrote:
>> On 2019-Apr-02, Tom Lane wrote:
>>
>> > Andres Freund <andres(at)anarazel(dot)de> writes:
>> > >> 2019-04-01 15:27:38.829 +07 [7524] STATEMENT: UPDATE pgbench_accounts SET
>> > >> abalance = 1 WHERE aid = 1;
>> > >> 2019-04-01 15:27:38.829 +07 [7524] PANIC: cannot abort transaction
>> > >> 400048276, it was already committed
>> >
>> > > But that's probably a separate issue.
>> >
>> > What that seems to indicate is that the "unexpected table_lock_tuple
>> > status" error was thrown during commit, which seems pretty odd.
>>
>> AFAICS this error can only come from ExecDelete(), because the value 1
>> is TM_Invisible and the other callsites where the "unexpected
>> table_lock_tuple" error appears use different wording for that one.
>
> Hm? Why couldn't it be the ExecUpdate() case?
>
>
>> Maybe it's the result of a deferred constraint being checked at that
>> time ... maybe it's trying to honor an "on cascade delete" setting for
>> an FK, and the affected tuple has already been updated or deleted?
>
> Then it ought to get TM_Deleted, no? We ought to wait till that
> transaction commits, and then roll back. So there's something odd
> happening here. I suspect there has to be some additional log output
> or
> such to explain this.
>
> Greetings,
>
> Andres Freund

I made the new core file:

[New LWP 5967]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: zharkov test 127.0.0.1(35888) UPDATE
'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fe194769428 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fe194769428 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fe19476b02a in __GI_abort () at abort.c:89
#2 0x0000000000a9c0ac in errfinish (dummy=0) at elog.c:552
#3 0x0000000000a9e653 in elog_finish (elevel=22, fmt=0xb695a0 "cannot
abort transaction %u, it was already committed") at elog.c:1365
#4 0x0000000000562c87 in RecordTransactionAbort (isSubXact=false) at
xact.c:1644
#5 0x0000000000563c93 in AbortTransaction () at xact.c:2670
#6 0x0000000000564516 in AbortCurrentTransaction () at xact.c:3191
#7 0x00000000009145b9 in PostgresMain (argc=1, argv=0x255d970,
dbname=0x255d950 "test", username=0x252dd88 "zharkov") at
postgres.c:4033
#8 0x000000000086406c in BackendRun (port=0x2556020) at
postmaster.c:4399
#9 0x00000000008637cd in BackendStartup (port=0x2556020) at
postmaster.c:4090
#10 0x000000000085facf in ServerLoop () at postmaster.c:1703
#11 0x000000000085f2dd in PostmasterMain (argc=3, argv=0x252bcf0) at
postmaster.c:1376
#12 0x000000000077aeba in main (argc=3, argv=0x252bcf0) at main.c:228

Log:
2019-04-02 21:51:19.639 +07 [5967] STATEMENT: UPDATE pgbench_accounts
SET abalance = 1 WHERE aid = 1;
2019-04-02 21:51:19.639 +07 [5967] PANIC: cannot abort transaction
200011850, it was already committed
2019-04-02 21:53:57.924 +07 [2473] LOG: server process (PID 5967) was
terminated by signal 6: Aborted
2019-04-02 21:53:57.924 +07 [2473] DETAIL: Failed process was running:
UPDATE pgbench_accounts SET abalance = 1 WHERE aid = 1;
2019-04-02 21:53:57.924 +07 [2473] LOG: terminating any other active
server processes

We run pgbench with two custom scripts.

First:
\set i random_zipfian(1, 100000 * :scale, 1.01)
SELECT abalance FROM pgbench_accounts WHERE aid = :i;

Second:
\set i random_zipfian(1, 100000 * :scale, 1.01)
UPDATE pgbench_accounts SET abalance = 1 WHERE aid = :i;

pgbench connects to the database through the connection pooller called
Odyssey.
We use pool of 16 or 32 backends.
We run four series of pgbench tests with different connections count (
100, 150, 200, ... , 1000 )
Three series with pool size 16 runs normally.
But fourth series fails with 200-300 connections. It took about five
hours to reproduce the error this time.

Here is the little piece of the pgbench log:

Bench start
Tue Apr 2 14:43:12 UTC 2019
Timestamp 1554216192
transaction type: multiple scripts
scaling factor: 1
query mode: simple
number of clients: 200
number of threads: 32
duration: 300 s
number of transactions actually processed: 8062981
latency average = 7.442 ms
tps = 26875.375100 (including connections establishing)
tps = 26875.694564 (excluding connections establishing)
SQL script 1: ./sql/ycsb_read_zipf.sql
- weight: 1 (targets 50.0% of total)
- 4019635 transactions (49.9% of total, tps = 13398.171023)
- latency average = 6.955 ms
- latency stddev = 8.366 ms
SQL script 2: ./sql/ycsb_update_zipf.sql
- weight: 1 (targets 50.0% of total)
- 4021508 transactions (49.9% of total, tps = 13404.414071)
- latency average = 7.909 ms
- latency stddev = 8.478 ms
Bench end
Tue Apr 2 14:48:12 UTC 2019
Timestamp 1554216492
Bench start
Tue Apr 2 14:48:12 UTC 2019
Timestamp 1554216492
client 220 script 1 aborted in command 1 query 0: PANIC: cannot abort
transaction 200011850, it was already committed
WARNING: terminating connection because of crash of another server
process

--
regards,

Roman Zharkov

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jeff Janes 2019-04-02 17:10:48 Re: BUG #15722: 9000x performance regression on query starting PostgreSQL 10 due to bad query plan
Previous Message Alvaro Herrera 2019-04-02 17:00:51 Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed