Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed

From: r(dot)zharkov(at)postgrespro(dot)ru
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed
Date: 2019-04-03 12:59:23
Message-ID: 994bff111c549f45041cd1dc7473548e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2019-04-03 19:20, Alvaro Herrera wrote:
> On 2019-Apr-03, r(dot)zharkov(at)postgrespro(dot)ru wrote:
>
>> #0 0x00007f5e0dc7b428 in __GI_raise (sig=sig(at)entry=6) at
>> ../sysdeps/unix/sysv/linux/raise.c:54
>> #1 0x00007f5e0dc7d02a in __GI_abort () at abort.c:89
>> #2 0x0000000000a9b7ce in ExceptionalCondition (
>> conditionName=0xb3de08 "!((((tuple->t_data->t_infomask) & 0x0080)
>> ||
>> (((tuple->t_data->t_infomask) & (0x1000 | ((0x0040 | 0x0010) | 0x0040
>> |
>> 0x0010))) == 0x0040)))",
>> errorType=0xb39a69 "FailedAssertion", fileName=0xb39ae8
>> "heapam.c",
>> lineNumber=4286) at assert.c:54
>> #3 0x00000000004ea024 in heap_lock_tuple (relation=0x7f5e0f252d80,
>> tuple=0x30f4dd0, cid=0, mode=LockTupleNoKeyExclusive,
>> wait_policy=LockWaitBlock, follow_updates=false,
>> buffer=0x7ffed968cbb4, tmfd=0x7ffed968cd80) at heapam.c:4286
>
> Uhm ... this assertion failure is doesn't seem related to the other
> problem you were reporting. Here, it's failing because it thinks the
> tuple should be marked as only locked when the current transaction is
> the only Xmax for the tuple.
>
> TransactionIdIsCurrentTransactionId(xwait))
> {
> /* ... but if the xmax changed in the meantime, start over */
> LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
> if (xmax_infomask_changed(tuple->t_data->t_infomask, infomask) ||
> !TransactionIdEquals(HeapTupleHeaderGetRawXmax(tuple->t_data),
> xwait))
> goto l3;
> Assert(HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_data->t_infomask)); //
> <--- failed assertion
> require_sleep = false;
> }
>
> I'm not quite sure what's up with that.

Today we reproduced the error without the pooller. All server settings
are the same.

pgbench test -j 32 -T 36000 -f ycsb_read_zipf.sql -f
ycsb_update_zipf.sql -c 32 -P 60

Backtrace:

[New LWP 29710]
[Thread debugging using libthread_db enabled]
Using host libthread_db library
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: zharkov test [local] UPDATE
'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f0db56c3428 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f0db56c3428 in __GI_raise (sig=sig(at)entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f0db56c502a in __GI_abort () at abort.c:89
#2 0x0000000000a9c0ac in errfinish (dummy=0) at elog.c:552
#3 0x0000000000a9e653 in elog_finish (elevel=22, fmt=0xc7c7c0
"unexpected table_lock_tuple status: %u") at elog.c:1365
#4 0x00000000007422c1 in ExecUpdate (mtstate=0x2e02370,
tupleid=0x7fff6a5c1610, oldtuple=0x0, slot=0x2e03db8,
planSlot=0x2e03c40, epqstate=0x2e02468, estate=0x2e01fc8,
canSetTag=true) at nodeModifyTable.c:1272
#5 0x0000000000743756 in ExecModifyTable (pstate=0x2e02370) at
nodeModifyTable.c:2070
#6 0x0000000000712671 in ExecProcNodeFirst (node=0x2e02370) at
execProcnode.c:445
#7 0x000000000070748c in ExecProcNode (node=0x2e02370) at
../../../src/include/executor/executor.h:239
#8 0x0000000000709d11 in ExecutePlan (estate=0x2e01fc8,
planstate=0x2e02370, use_parallel_mode=false, operation=CMD_UPDATE,
sendTuples=false, numberTuples=0, direction=ForwardScanDirection,
dest=0x2e050f8,
execute_once=true) at execMain.c:1643
#9 0x0000000000707ac9 in standard_ExecutorRun (queryDesc=0x2d56a18,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:362
#10 0x0000000000707905 in ExecutorRun (queryDesc=0x2d56a18,
direction=ForwardScanDirection, count=0, execute_once=true) at
execMain.c:306
#11 0x0000000000915950 in ProcessQuery (plan=0x2df5d10,
sourceText=0x2d2d1d8 "UPDATE pgbench_accounts SET abalance = 1 WHERE aid
= 1;", params=0x0, queryEnv=0x0, dest=0x2e050f8,
completionTag=0x7fff6a5c1a90 "")
at pquery.c:161
#12 0x000000000091739c in PortalRunMulti (portal=0x2daf078,
isTopLevel=true, setHoldSnapshot=false, dest=0x2e050f8,
altdest=0x2e050f8, completionTag=0x7fff6a5c1a90 "") at pquery.c:1283
#13 0x00000000009168e8 in PortalRun (portal=0x2daf078,
count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x2e050f8, altdest=0x2e050f8, completionTag=0x7fff6a5c1a90 "") at
pquery.c:796
#14 0x000000000091027a in exec_simple_query (query_string=0x2d2d1d8
"UPDATE pgbench_accounts SET abalance = 1 WHERE aid = 1;") at
postgres.c:1215
#15 0x00000000009148ff in PostgresMain (argc=1, argv=0x2d59a80,
dbname=0x2d59960 "test", username=0x2d29d98 "zharkov") at
postgres.c:4247
#16 0x000000000086406c in BackendRun (port=0x2d52030) at
postmaster.c:4399
#17 0x00000000008637cd in BackendStartup (port=0x2d52030) at
postmaster.c:4090
#18 0x000000000085facf in ServerLoop () at postmaster.c:1703
#19 0x000000000085f2dd in PostmasterMain (argc=3, argv=0x2d27d00) at
postmaster.c:1376
#20 0x000000000077aeba in main (argc=3, argv=0x2d27d00) at main.c:228

-------------------------------------------

Some information from first core dump ( thanks to Konstantin Knizhnik ):

(gdb) p *tuple->t_data
$6 = {t_choice = {t_heap = {t_xmin = 200004728, t_xmax = 200004747,
t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 200004728,
datum_typmod = 200004747, datum_typeid = 0}}, t_ctid = {ip_blkid =
{bi_hi = 0, bi_lo = 2337}, ip_posid = 61}, t_infomask2 = 49156,
t_infomask = 9474,
t_hoff = 24 '\030', t_bits = 0x7f59f9ea1c17 ""}
(gdb) p xwait
$4 = 200004747
(gdb) p /x infomask
$7 = 0x2502
(gdb) p result
$8 = TM_Updated
(gdb) p mode
$9 = LockTupleNoKeyExclusive
(gdb) p *CurrentTransactionState
$11 = {fullTransactionId = {value = 200004747}, subTransactionId = 1,
name = 0x0, savepointLevel = 0, state = TRANS_INPROGRESS, blockState =
TBLOCK_STARTED,
nestingLevel = 1, gucNestLevel = 1, curTransactionContext = 0x303cda0,
curTransactionOwner = 0x3054660, childXids = 0x0, nChildXids = 0,
maxChildXids = 0,
prevUser = 10, prevSecContext = 0, prevXactReadOnly = false,
startedInRecovery = false, didLogXid = false, parallelModeLevel = 0,
chain = false, parent = 0x0}

--
regards

Roman Zharkov

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michał Iwańczuk 2019-04-03 13:29:46 Bug report - incorrect value displayed in jsonb column for large numbers
Previous Message Alvaro Herrera 2019-04-03 12:20:27 Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed