Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed

From: r(dot)zharkov(at)postgrespro(dot)ru
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed
Date: 2019-04-11 03:29:03
Message-ID: 17809a9ce6649f6b9095d782d73ed26b@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2019-04-07 00:17, Andres Freund wrote:

> I'd be very surprised if it weren't
>
> commit 5db6df0c0117ff2a4e0cd87594d2db408cd5022f
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: 2019-03-23 19:55:57 -0700
>
> tableam: Add tuple_{insert, delete, update, lock} and use.

> Greetings,
>
> Andres Freund

Hello,
I finished bisect and recheck the result.

ad308058cc8666c50b43179e64d6bb7aeb3ba169 is the first bad commit
commit ad308058cc8666c50b43179e64d6bb7aeb3ba169
Author: Thomas Munro <tmunro(at)postgresql(dot)org>
Date: Thu Mar 28 10:59:19 2019 +1300

Use FullTransactionId for the transaction stack.

Provide GetTopFullTransactionId() and GetCurrentFullTransactionId().
The intended users of these interfaces are access methods that use
xids for visibility checks but don't want to have to go back and
"freeze" existing references some time later before the 32 bit xid
counter wraps around.

Use a new struct to serialize the transaction state for parallel
query, because FullTransactionId doesn't fit into the previous
serialization scheme very well.

Author: Thomas Munro
Reviewed-by: Heikki Linnakangas
Discussion:
https://postgr.es/m/CAA4eK1%2BMv%2Bmb0HFfWM9Srtc6MVe160WFurXV68iAFMcagRZ0dQ%40mail.gmail.com

:040000 040000 3080a54b6840daf3e08ada1edfb8d5d61c1099c6
4663f00a917eb526d27da29d1c059e20c58d4b34 M src

I reproduced the error on two servers.
First server, commit 16954e2:
PostgreSQL 12devel-master/16954e2 on x86_64-pc-linux-gnu, compiled by
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit

[New LWP 8346]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: zharkov test [local] UPDATE
'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fd37aca9428 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fd37aca9428 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fd37acab02a in __GI_abort () at abort.c:89
#2 0x0000000000aa52a7 in ExceptionalCondition (conditionName=0xb47688
"!((((tuple->t_data->t_infomask) & 0x0080) ||
(((tuple->t_data->t_infomask) & (0x1000 | ((0x0040 | 0x0010) | 0x0040 |
0x0010))) == 0x0040)))",
errorType=0xb432e9 "FailedAssertion", fileName=0xb43368 "heapam.c",
lineNumber=4289) at assert.c:54
#3 0x00000000004ea31d in heap_lock_tuple (relation=0x7fd37c3260b0,
tuple=0x2ec61a0, cid=0, mode=LockTupleNoKeyExclusive,
wait_policy=LockWaitBlock, follow_updates=false, buffer=0x7ffd787bf774,
tmfd=0x7ffd787bf940)
at heapam.c:4289
#4 0x00000000004f5c3e in heapam_tuple_lock (relation=0x7fd37c3260b0,
tid=0x7ffd787bf9b0, snapshot=0x2e731b0, slot=0x2ec6150, cid=0,
mode=LockTupleNoKeyExclusive, wait_policy=LockWaitBlock, flags=2 '\002',
tmfd=0x7ffd787bf940) at heapam_handler.c:361
#5 0x0000000000745726 in table_lock_tuple (rel=0x7fd37c3260b0,
tid=0x7ffd787bf9b0, snapshot=0x2e731b0, slot=0x2ec6150, cid=0,
mode=LockTupleNoKeyExclusive, wait_policy=LockWaitBlock, flags=2 '\002',
tmfd=0x7ffd787bf940) at ../../../src/include/access/tableam.h:1228
#6 0x000000000074794d in ExecUpdate (mtstate=0x2eb6a50,
tupleid=0x7ffd787bf9b0, oldtuple=0x0, slot=0x2eb8498,
planSlot=0x2eb8320, epqstate=0x2eb6b48, estate=0x2eb66a8,
canSetTag=true) at nodeModifyTable.c:1369
#7 0x0000000000748f39 in ExecModifyTable (pstate=0x2eb6a50) at
nodeModifyTable.c:2216
#8 0x0000000000717c7d in ExecProcNodeFirst (node=0x2eb6a50) at
execProcnode.c:445
#9 0x000000000070c9e6 in ExecProcNode (node=0x2eb6a50) at
../../../src/include/executor/executor.h:239
#10 0x000000000070f265 in ExecutePlan (estate=0x2eb66a8,
planstate=0x2eb6a50, use_parallel_mode=false, operation=CMD_UPDATE,
sendTuples=false, numberTuples=0, direction=ForwardScanDirection,
dest=0x2eb97d8,
execute_once=true) at execMain.c:1648
#11 0x000000000070d023 in standard_ExecutorRun (queryDesc=0x2e16688,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:365
#12 0x000000000070ce5f in ExecutorRun (queryDesc=0x2e16688,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:309
#13 0x000000000091d5e0 in ProcessQuery (plan=0x2eaa3f0,
sourceText=0x2dee1c8 "UPDATE pgbench_accounts SET abalance = 1 WHERE aid
= 1;", params=0x0, queryEnv=0x0, dest=0x2eb97d8,
completionTag=0x7ffd787bfe30 "")
at pquery.c:161
#14 0x000000000091f02c in PortalRunMulti (portal=0x2e557c8,
isTopLevel=true, setHoldSnapshot=false, dest=0x2eb97d8,
altdest=0x2eb97d8, completionTag=0x7ffd787bfe30 "") at pquery.c:1283
#15 0x000000000091e578 in PortalRun (portal=0x2e557c8,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x2eb97d8, altdest=0x2eb97d8, completionTag=0x7ffd787bfe30 "") at
pquery.c:796
#16 0x0000000000917ee9 in exec_simple_query (query_string=0x2dee1c8
"UPDATE pgbench_accounts SET abalance = 1 WHERE aid = 1;") at
postgres.c:1215
#17 0x000000000091c58f in PostgresMain (argc=1, argv=0x2e198c0,
dbname=0x2e197a0 "test", username=0x2dead88 "zharkov") at
postgres.c:4249
#18 0x000000000086afb2 in BackendRun (port=0x2e0f170) at
postmaster.c:4430
#19 0x000000000086a713 in BackendStartup (port=0x2e0f170) at
postmaster.c:4121
#20 0x0000000000866952 in ServerLoop () at postmaster.c:1704
#21 0x0000000000866160 in PostmasterMain (argc=3, argv=0x2de8cf0) at
postmaster.c:1377
#22 0x00000000007803ba in main (argc=3, argv=0x2de8cf0) at main.c:228

Second server, commit 16954e2:
PostgreSQL 12devel-master/16954e22e2 on x86_64-pc-linux-gnu, compiled by
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

[New LWP 6145]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: zharkov test [local] UPDATE
'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fc5cfa2dfcf in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007fc5cfa2dfcf in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007fc5cfa2f3fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00005580a00b5590 in ExceptionalCondition (
conditionName=0x5580a01550e8 "!((((tuple->t_data->t_infomask) &
0x0080) || (((tuple->t_data->t_infomask) & (0x1000 | ((0x0040 | 0x0010)
| 0x0040 | 0x0010))) == 0x0040)))",
errorType=0x5580a0150d49 "FailedAssertion", fileName=0x5580a0150dc8
"heapam.c", lineNumber=4289) at assert.c:54
#3 0x000055809fb1e26c in heap_lock_tuple (relation=0x7fc5d10b6c00,
tuple=0x5580a0bd3e20, cid=0, mode=LockTupleNoKeyExclusive,
wait_policy=LockWaitBlock, follow_updates=false,
buffer=0x7ffe391f876c, tmfd=0x7ffe391f8880) at heapam.c:4289
#4 0x000055809fb2980e in heapam_tuple_lock (relation=0x7fc5d10b6c00,
tid=0x7ffe391f895a, snapshot=0x5580a0b280e0, slot=0x5580a0bd3dd0, cid=0,
mode=LockTupleNoKeyExclusive,
wait_policy=LockWaitBlock, flags=2 '\002', tmfd=0x7ffe391f8880) at
heapam_handler.c:361
#5 0x000055809fd68ba0 in table_lock_tuple (rel=0x7fc5d10b6c00,
tid=0x7ffe391f895a, snapshot=0x5580a0b280e0, slot=0x5580a0bd3dd0, cid=0,
mode=LockTupleNoKeyExclusive,
wait_policy=LockWaitBlock, flags=2 '\002', tmfd=0x7ffe391f8880) at
../../../src/include/access/tableam.h:1228
#6 0x000055809fd6ace3 in ExecUpdate (mtstate=0x5580a0bca7d0,
tupleid=0x7ffe391f895a, oldtuple=0x0, slot=0x5580a0bcc218,
planSlot=0x5580a0bcc0a0, epqstate=0x5580a0bca8c8,
estate=0x5580a0bca428, canSetTag=true) at nodeModifyTable.c:1369
#7 0x000055809fd6c2da in ExecModifyTable (pstate=0x5580a0bca7d0) at
nodeModifyTable.c:2216
#8 0x000055809fd3b65d in ExecProcNodeFirst (node=0x5580a0bca7d0) at
execProcnode.c:445
#9 0x000055809fd304b7 in ExecProcNode (node=0x5580a0bca7d0) at
../../../src/include/executor/executor.h:239
#10 0x000055809fd32e7a in ExecutePlan (estate=0x5580a0bca428,
planstate=0x5580a0bca7d0, use_parallel_mode=false, operation=CMD_UPDATE,
sendTuples=false, numberTuples=0,
direction=ForwardScanDirection, dest=0x5580a0bcd558,
execute_once=true) at execMain.c:1648
#11 0x000055809fd30b0d in standard_ExecutorRun
(queryDesc=0x5580a0b2a8d8, direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:365
#12 0x000055809fd30931 in ExecutorRun (queryDesc=0x5580a0b2a8d8,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:309
#13 0x000055809ff3a439 in ProcessQuery (plan=0x5580a0bbe170,
sourceText=0x5580a0b031a8 "UPDATE pgbench_accounts SET abalance = 1
WHERE aid = 3;", params=0x0, queryEnv=0x0,
dest=0x5580a0bcd558, completionTag=0x7ffe391f8d60 "") at
pquery.c:161
#14 0x000055809ff3bdaf in PortalRunMulti (portal=0x5580a0b69958,
isTopLevel=true, setHoldSnapshot=false, dest=0x5580a0bcd558,
altdest=0x5580a0bcd558, completionTag=0x7ffe391f8d60 "")
at pquery.c:1283
#15 0x000055809ff3b366 in PortalRun (portal=0x5580a0b69958,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x5580a0bcd558, altdest=0x5580a0bcd558,
completionTag=0x7ffe391f8d60 "") at pquery.c:796
#16 0x000055809ff34e8d in exec_simple_query (query_string=0x5580a0b031a8
"UPDATE pgbench_accounts SET abalance = 1 WHERE aid = 3;") at
postgres.c:1215
#17 0x000055809ff394ca in PostgresMain (argc=1, argv=0x5580a0b2da50,
dbname=0x5580a0b2d930 "test", username=0x5580a0affd68 "zharkov") at
postgres.c:4249
#18 0x000055809fe8aa98 in BackendRun (port=0x5580a0b256f0) at
postmaster.c:4430
#19 0x000055809fe8a201 in BackendStartup (port=0x5580a0b256f0) at
postmaster.c:4121
#20 0x000055809fe863b6 in ServerLoop () at postmaster.c:1704
#21 0x000055809fe85be4 in PostmasterMain (argc=3, argv=0x5580a0afdcd0)
at postmaster.c:1377
#22 0x000055809fda294e in main (argc=3, argv=0x5580a0afdcd0) at
main.c:228

On the first bad commit:
First server:
PostgreSQL 12devel-HEAD/ad30805 on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit

[New LWP 25344]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: zharkov test [local] UPDATE
'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007fc51e46e428 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fc51e46e428 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fc51e47002a in __GI_abort () at abort.c:89
#2 0x0000000000a98e22 in errfinish (dummy=0) at elog.c:552
#3 0x0000000000a9b3c9 in elog_finish (elevel=22, fmt=0xb66160 "cannot
abort transaction %u, it was already committed") at elog.c:1365
#4 0x0000000000561df4 in RecordTransactionAbort (isSubXact=false) at
xact.c:1644
#5 0x0000000000562e00 in AbortTransaction () at xact.c:2670
#6 0x0000000000563683 in AbortCurrentTransaction () at xact.c:3191
#7 0x00000000009113f6 in PostgresMain (argc=1, argv=0x240e8b0,
dbname=0x240e790 "test", username=0x23dfd98 "zharkov") at
postgres.c:4033
#8 0x0000000000861064 in BackendRun (port=0x2404160) at
postmaster.c:4399
#9 0x00000000008607c5 in BackendStartup (port=0x2404160) at
postmaster.c:4090
#10 0x000000000085cac7 in ServerLoop () at postmaster.c:1703
#11 0x000000000085c2d5 in PostmasterMain (argc=3, argv=0x23ddd00) at
postmaster.c:1376
#12 0x0000000000777edf in main (argc=3, argv=0x23ddd00) at main.c:228

Second server:
PostgreSQL 12devel-HEAD/ad308058cc on x86_64-pc-linux-gnu, compiled by
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

[New LWP 11211]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: zharkov test [local] UPDATE
'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007feda4a2cfcf in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007feda4a2cfcf in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007feda4a2e3fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x000055802b411af6 in errfinish (dummy=0) at elog.c:552
#3 0x000055802b414035 in elog_finish (elevel=22, fmt=0x55802b4dbb60
"cannot abort transaction %u, it was already committed") at elog.c:1365
#4 0x000055802aefbd39 in RecordTransactionAbort (isSubXact=false) at
xact.c:1644
#5 0x000055802aefcdbd in AbortTransaction () at xact.c:2670
#6 0x000055802aefd6c7 in AbortCurrentTransaction () at xact.c:3191
#7 0x000055802b296b90 in PostgresMain (argc=1, argv=0x55802d723a30,
dbname=0x55802d723910 "test", username=0x55802d6f5d78 "zharkov") at
postgres.c:4033
#8 0x000055802b1e9059 in BackendRun (port=0x55802d71a1f0) at
postmaster.c:4399
#9 0x000055802b1e87c2 in BackendStartup (port=0x55802d71a1f0) at
postmaster.c:4090
#10 0x000055802b1e4a44 in ServerLoop () at postmaster.c:1703
#11 0x000055802b1e4272 in PostmasterMain (argc=3, argv=0x55802d6f3ce0)
at postmaster.c:1376
#12 0x000055802b10299f in main (argc=3, argv=0x55802d6f3ce0) at
main.c:228

The error does not reproduce on commit 2fc7af5 on both servers.

--
regards,
Roman Zharkov

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2019-04-11 03:43:49 Re: Suspicious strcmp() in src/backend/parser/parse_expr.c
Previous Message Thomas Munro 2019-04-11 03:26:31 Re: Suspicious strcmp() in src/backend/parser/parse_expr.c