Re: backend crash on DELETE, reproducible locally

From: Ondřej Bouda <bouda(at)edookit(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org, Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
Subject: Re: backend crash on DELETE, reproducible locally
Date: 2018-11-06 18:31:50
Message-ID: d1823964-157d-34f5-f359-79da00872861@edookit.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Dne 3.11.2018 v 20:38 Tom Lane napsal(a):
> Yeah, dump/reload would make the problem go away.
>
> Same bug.

So we dumped and restored all our databases. After that, the crash on
DELETE never occurred (before, it was several times a day). However, the
crash on UPDATE still occurs on specific rows. We are quite certain no
ALTER TABLE statement was executed on the table after the restore.
There are two AFTER INSERT OR UPDATE constraint triggers and one BEFORE
INSERT OR UPDATE trigger on the table, all of which are implemented in
plpgsql. Multiple physical servers, on separate databases with identical
schema, crash on the same type of UPDATE query (different just in
concrete values to be updated). The same code worked perfectly on 10.x.

See the attached backtrace below. Can we do something else to catch the
bug? Or can we hope for this bug to be already fixed and released in the
upcoming version?

Regards,
Ondřej Bouda

[New LWP 8307]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: 11/main: usr mydb 127.0.0.1(56829)
UPDATE '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:793
#0 __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:793
No locals.
#1 0x00007f6a6b993296 in datumIsEqual (value1=12, value2=12,
typByVal=<optimized out>, typLen=16) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/utils/adt/datum.c:249
size1 = 16
size2 = <optimized out>
s1 = 0xc <error: Cannot access memory at address 0xc>
s2 = 0xc <error: Cannot access memory at address 0xc>
res = <optimized out>
#2 0x00007f6a6b6b89de in ProjIndexIsUnchanged (newtup=0x7f6a6d5ccdd0,
oldtup=0x7ffc9c1c1270, relation=0x7f6a6b4c3680) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/access/heap/heapam.c:4539
att = <optimized out>
indexOid = <optimized out>
indexDesc = 0x7f6a6b4c7cf0
indexInfo = 0x7f6a6d5cdbf8
i = 0
indexoidlist = <optimized out>
econtext = 0x7f6a6d4c2c10
new_isnull = {false, false, false, 156, 252, 127, false, false,
44, 29, 128, 107, 106, 127, false, false, 203, 89, 76, 107, 106, 127,
false, false, 123, false, false, false, false, false, false, false}
slot = 0x7f6a6d5cf8b8
equals = true
old_isnull = {false, false, false, 109, 106, 127, false, false,
69, 46, 80, 109, 106, 127, false, false, 64, 46, 80, 109, 106, 127,
false, false, 72, 46, 80, 109, 106, 127, false, false}
new_values = {12, 2, 140095078059546, 16, 140722927572064,
140095077232192, 140095077233456, 140722927572079, 0, 0,
140722927572048, 140095054460096, 140095077050704, 140095078061816, 5,
4, 140722927572096, 140095046814123, 140722927572096, 140095046825162,
140095043418528, 140095077051552, 140722927572224, 140095046865098,
60129542543, 23274993280, 140095043418528, 140095077050704,
140095043417728, 140095077050976, 140095078056288, 140095043422512}
indexno = 2
l = 0x7f6a6d503840
estate = 0x7f6a6d4c2a00
old_values = {12, 1, 140078631611970, 1, 140095077050704, 1024,
140095077232192, 0, 140722927571760, 140095049517261, 140722927571808,
140722927571776, 140722927571792, 140095046814329, 18, 140095077232200,
140095076976896, 140095078056288, 22, 100, 140722927571856,
140722927571920, 140095046831813, 140722927571856, 140095046842624, 0,
140095077226392, 3, 140095077232192, 140095077232528, 140095077232192,
140095078061816}
#3 heap_update (relation=relation(at)entry=0x7f6a6b4c3680,
otid=otid(at)entry=0x7ffc9c1c1690, newtup=newtup(at)entry=0x7f6a6d5ccdd0,
cid=0, crosscheck=<optimized out>, wait=wait(at)entry=true,
hufd=hufd(at)entry=0x7ffc9c1c15b0, lockmode=lockmode(at)entry=0x7ffc9c1c15a4)
at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/access/heap/heapam.c:4230
result = <optimized out>
xid = 245762845
hot_attrs = 0x7f6a6d503740
proj_idx_attrs = 0x7f6a6d503758
key_attrs = 0x7f6a6d503770
id_attrs = 0x7f6a6d503788
interesting_attrs = 0x7f6a6d5037a0
lp = <optimized out>
oldtup = {t_len = 85, t_self = {ip_blkid = {bi_hi = 0, bi_lo =
45}, ip_posid = 10}, t_tableOid = 557732818, t_data = 0x7f669913ee10}
heaptup = 0x7f6a6d5ccdd0
old_key_tuple = 0x0
old_key_copied = false
page = 0x7f669913d180 "]="
block = <optimized out>
mxact_status = <optimized out>
buffer = <optimized out>
newbuf = 14683
vmbuffer = 0
vmbuffer_new = 0
need_toast = <optimized out>
newtupsize = <optimized out>
pagefree = <optimized out>
have_tuple_lock = false
iscombo = false
use_hot_update = false
hot_attrs_checked = <optimized out>
key_intact = <optimized out>
all_visible_cleared = false
all_visible_cleared_new = false
checked_lockers = <optimized out>
locker_remains = <optimized out>
xmax_new_tuple = <optimized out>
xmax_old_tuple = 245762845
infomask_old_tuple = 0
infomask2_old_tuple = 8192
infomask_new_tuple = 144
infomask2_new_tuple = 0
__func__ = "heap_update"
#4 0x00007f6a6b82f2c9 in ExecUpdate
(mtstate=mtstate(at)entry=0x7f6a6d4d81c0, tupleid=0x7ffc9c1c1690,
oldtuple=0x0, slot=0x7f6a6d5cc160,
planSlot=planSlot(at)entry=0x7f6a6d567578,
epqstate=epqstate(at)entry=0x7f6a6d4d8280,
estate=estate(at)entry=0x7f6a6d4d6a60, canSetTag=true) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/executor/nodeModifyTable.c:1209
lockmode = LockTupleNoKeyExclusive
partition_constraint_failed = false
tuple = 0x7f6a6d5ccdd0
resultRelInfo = 0x7f6a6d4d6ca0
resultRelationDesc = 0x7f6a6b4c3680
result = <optimized out>
hufd = {ctid = {ip_blkid = {bi_hi = 5616, bi_lo = 39964},
ip_posid = 32764}, xmax = 1803594893, cmax = 32618}
recheckIndexes = 0x0
saved_tcs_map = 0x0
__func__ = "ExecUpdate"
#5 0x00007f6a6b82f978 in ExecModifyTable (pstate=0x7f6a6d4d81c0) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/executor/nodeModifyTable.c:2166
node = 0x7f6a6d4d81c0
proute = 0x0
estate = 0x7f6a6d4d6a60
operation = CMD_UPDATE
saved_resultRelInfo = 0x0
resultRelInfo = 0x7f6a6d4d6ca0
subplanstate = 0x7f6a6d566988
junkfilter = 0x7f6a6d5cc038
slot = <optimized out>
tupleid = <optimized out>
tuple_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 45}, ip_posid = 10}
oldtupdata = {t_len = 2619086544, t_self = {ip_blkid = {bi_hi =
32764, bi_lo = 0}, ip_posid = 9204}, t_tableOid = 32618, t_data =
0x7ffc9c1c16d0}
oldtuple = <optimized out>
__func__ = "ExecModifyTable"
#6 0x00007f6a6b80b25b in ExecProcNode (node=0x7f6a6d4d81c0) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/include/executor/executor.h:237
No locals.
#7 ExecutePlan (execute_once=<optimized out>, dest=0x7f6a6d5e9448,
direction=<optimized out>, numberTuples=0, sendTuples=false,
operation=CMD_UPDATE, use_parallel_mode=<optimized out>,
planstate=0x7f6a6d4d81c0, estate=0x7f6a6d4d6a60) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/executor/execMain.c:1723
slot = <optimized out>
current_tuple_count = 0
#8 standard_ExecutorRun (queryDesc=0x7f6a6d566040, direction=<optimized
out>, count=0, execute_once=<optimized out>) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/executor/execMain.c:364
estate = 0x7f6a6d4d6a60
operation = CMD_UPDATE
dest = 0x7f6a6d5e9448
sendTuples = <optimized out>
__func__ = "standard_ExecutorRun"
#9 0x00007f6a6b95b41d in ProcessQuery (plan=<optimized out>,
sourceText=0x7f6a6d3b45a0 "UPDATE public.schedulecard SET
ext_ident=NULL, finalreport_absence_cnt=NULL, period_day=2,
period_num=1, rotates=false, rotationgroup_id=NULL,
schedulecard_id=NULL, schedulecardblock_id=NULL, schedule"...,
params=0x0, queryEnv=0x0, dest=0x7f6a6d5e9448,
completionTag=0x7ffc9c1c19d0 "") at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/tcop/pquery.c:161
queryDesc = 0x7f6a6d566040
#10 0x00007f6a6b95b668 in PortalRunMulti
(portal=portal(at)entry=0x7f6a6d479d00, isTopLevel=isTopLevel(at)entry=true,
setHoldSnapshot=setHoldSnapshot(at)entry=false,
dest=dest(at)entry=0x7f6a6d5e9448, altdest=altdest(at)entry=0x7f6a6d5e9448,
completionTag=completionTag(at)entry=0x7ffc9c1c19d0 "") at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/tcop/pquery.c:1286
pstmt = 0x7f6a6d5e92e8
active_snapshot_set = true
stmtlist_item = 0x7f6a6d5e93f8
#11 0x00007f6a6b95c2d5 in PortalRun (portal=portal(at)entry=0x7f6a6d479d00,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true,
run_once=run_once(at)entry=true, dest=dest(at)entry=0x7f6a6d5e9448,
altdest=altdest(at)entry=0x7f6a6d5e9448,
completionTag=completionTag(at)entry=0x7ffc9c1c19d0 "") at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/tcop/pquery.c:799
save_exception_stack = 0x7ffc9c1c1c00
save_context_stack = 0x0
local_sigjmp_buf = {{__jmpbuf = {140722927574016,
7153736072036697610, 140095075869728, 140722927573456, 140095076670720,
140095075869776, 7153736071992657418, 7091693995611473418},
__mask_was_saved = 0, __saved_mask = {__val = {0, 140723125634669, 0,
140095051070678, 64, 112, 140095075869728, 140095075861648,
140095078175768, 140722927573328, 140095049517496, 140722927573360, 2,
140095075869728, 2, 140722927573360}}}}
result = <optimized out>
nprocessed = <optimized out>
saveTopTransactionResourceOwner = 0x7f6a6d41b148
saveTopTransactionContext = 0x7f6a6d48d8d0
saveActivePortal = 0x0
saveResourceOwner = 0x7f6a6d41b148
savePortalContext = 0x0
saveMemoryContext = 0x7f6a6d48d8d0
__func__ = "PortalRun"
#12 0x00007f6a6b957f7f in exec_simple_query (query_string=0x7f6a6d3b45a0
"UPDATE public.schedulecard SET ext_ident=NULL,
finalreport_absence_cnt=NULL, period_day=2, period_num=1, rotates=false,
rotationgroup_id=NULL, schedulecard_id=NULL, schedulecardblock_id=NULL,
schedule"...) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/tcop/postgres.c:1122
parsetree = 0x7f6a6d3b6420
portal = 0x7f6a6d479d00
snapshot_set = <optimized out>
commandTag = <optimized out>
completionTag =
"\000\032\034\234\374\177\000\000\341\356\247kj\177\000\000\000\000\000\000\000\000\000\000\060\316\367kj\177\000\000\340&\360kj\177\000\000\234U\357kj\177\000\000\240E;mj\177\000\000Q\000\000\000\000\000\000"
querytree_list = <optimized out>
plantree_list = <optimized out>
receiver = 0x7f6a6d5e9448
format = 0
dest = DestRemote
parsetree_list = 0x7f6a6d3b6470
parsetree_item = 0x7f6a6d3b6450
save_log_statement_stats = false
was_logged = false
use_implicit_block = false
msec_str =
"\000\032\034\234\374\177\000\000\341\356\247kj\177\000\000\000\000\000\000\000\000\000\000\060\316\367kj\177\000"
__func__ = "exec_simple_query"
#13 0x00007f6a6b95930a in PostgresMain (argc=<optimized out>,
argv=argv(at)entry=0x7f6a6d400400, dbname=0x7f6a6d4003e0 "mydb",
username=<optimized out>) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/tcop/postgres.c:4159
query_string = 0x7f6a6d3b45a0 "UPDATE public.schedulecard SET
ext_ident=NULL, finalreport_absence_cnt=NULL, period_day=2,
period_num=1, rotates=false, rotationgroup_id=NULL,
schedulecard_id=NULL, schedulecardblock_id=NULL, schedule"...
firstchar = 81
input_message = {data = 0x7f6a6d3b45a0 "UPDATE
public.schedulecard SET ext_ident=NULL, finalreport_absence_cnt=NULL,
period_day=2, period_num=1, rotates=false, rotationgroup_id=NULL,
schedulecard_id=NULL, schedulecardblock_id=NULL, schedule"..., len =
328, maxlen = 1024, cursor = 328}
local_sigjmp_buf = {{__jmpbuf = {140095054460088,
7153736071902479882, 140095054460096, 1, 0, 140095076143920,
7153736071933937162, 7091693993451144714}, __mask_was_saved = 1,
__saved_mask = {__val = {0, 140095076143920, 140095004765576,
844424930131970, 206158430256, 140722927574360, 140722927574144, 293,
140095076143920, 0, 0, 140095075861648, 1024, 140722927574468, 0,
140722927574240}}}}
send_ready_for_query = false
disable_idle_in_transaction_timeout = false
__func__ = "PostgresMain"
#14 0x00007f6a6b67867d in BackendRun (port=0x7f6a6d3f9330) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/postmaster/postmaster.c:4361
ac = 1
secs = 594683937
usecs = 521905
i = 1
av = 0x7f6a6d400400
maxac = <optimized out>
#15 BackendStartup (port=0x7f6a6d3f9330) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/postmaster/postmaster.c:4033
bn = <optimized out>
pid = <optimized out>
#16 ServerLoop () at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/postmaster/postmaster.c:1706
rmask = {fds_bits = {8, 0 <repeats 15 times>}}
selres = <optimized out>
now = <optimized out>
readmask = {fds_bits = {200, 0 <repeats 15 times>}}
last_lockfile_recheck_time = 1541368681
last_touch_time = 1541366480
__func__ = "ServerLoop"
#17 0x00007f6a6b8e7644 in PostmasterMain (argc=5, argv=<optimized out>)
at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/postmaster/postmaster.c:1379
opt = <optimized out>
status = <optimized out>
userDoption = <optimized out>
listen_addr_saved = true
i = <optimized out>
output_config_variable = <optimized out>
__func__ = "PostmasterMain"
#18 0x00007f6a6b679616 in main (argc=5, argv=0x7f6a6d3af1f0) at
/build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/main/main.c:228
No locals.

Dne 3.11.2018 v 20:38 Tom Lane napsal(a):
> <obouda(at)email(dot)cz> writes:
>> we reached the exactly same problem after upgrading to PostgreSQL 11 - the
>> server crashed on a DELETE statement with a trigger. We also observed an
>> AFTER DELETE trigger receiving NULL values in OLD. Now I see the problem
>> seems to be solved (theoretically). Unfortunately, we are not able to build
>> the server with the patch, so we cannot confirm that. However, when we just
>> copied the database (within the same server), the same DELETE executed on
>> the copy with no problems.
>
> Yeah, dump/reload would make the problem go away.
>
>> I would like to ask, however: could the same problem arise from an UPDATE
>> statement (also on a table with an AFTER trigger), or would that be another
>> bug (technical details below)?
>
> Same bug.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-11-06 18:41:21 Re: why select count(*) consumes wal logs
Previous Message Michael Nolan 2018-11-06 18:27:02 Re: why select count(*) consumes wal logs

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-11-06 18:52:55 Re: backend crash on DELETE, reproducible locally
Previous Message Tom Lane 2018-11-06 18:28:21 Re: plruby: rb_iterate symbol clash with libruby.so