Re: backend crash on DELETE, reproducible locally

From: <obouda(at)email(dot)cz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>, <pgsql-general(at)lists(dot)postgresql(dot)org>, "Karsten Hilbert" <Karsten(dot)Hilbert(at)gmx(dot)net>
Subject: Re: backend crash on DELETE, reproducible locally
Date: 2018-11-03 19:19:19
Message-ID: 4kp.XAMr.3k5V4ORoB1t.1RtVIt@seznam.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hello,

we reached the exactly same problem after upgrading to PostgreSQL 11 - the
server crashed on a DELETE statement with a trigger. We also observed an
AFTER DELETE trigger receiving NULL values in OLD. Now I see the problem
seems to be solved (theoretically). Unfortunately, we are not able to build
the server with the patch, so we cannot confirm that. However, when we just
copied the database (within the same server), the same DELETE executed on
the copy with no problems.

I would like to ask, however: could the same problem arise from an UPDATE
statement (also on a table with an AFTER trigger), or would that be another
bug (technical details below)?

As the bug causes our production servers segfault several times a day, we'd
like to solve the problem as soon as possible. Do I understand it correctly
that if we dump and restore the database, the bug should not occur (until
the next ALTER TABLE ADD COLUMN - which we'll avoid until the patch is
released)?

With the update, we caught this (we also have the core dump and could
provide it privately if useful):

#0  __memcmp_sse4_1 () at ../sysdeps/x86_64/multiarch/memcmp-sse4.S:793

No locals.

#1  0x00007f6a6b993296 in datumIsEqual (value1=7, value2=7, typByVal=<
optimized out>, typLen=16) at /build/postgresql-11-oDwOcQ/postgresql-11-
11.0/build/../src/backend/utils/adt/datum.c:249

        size1 = 16

        size2 = <optimized out>

        s1 = 0x7 <error: Cannot access memory at address 0x7>

        s2 = 0x7 <error: Cannot access memory at address 0x7>

        res = <optimized out>

#2  0x00007f6a6b6b89de in ProjIndexIsUnchanged (newtup=0x7f6a6d504e80,
oldtup=0x7ffc9c1c1270, relation=0x7f6a6b4c1738) at /build/postgresql-11-
oDwOcQ/postgresql-11-11.0/build/../src/backend/access/heap/heapam.c:4539

        att = <optimized out>

        indexOid = <optimized out>

        indexDesc = 0x7f6a6b4c5008

        indexInfo = 0x7f6a6d505ca8

        i = 0

        indexoidlist = <optimized out>

        econtext = 0x7f6a6d4c5080

        new_isnull = {false, false, false, 156, 252, 127, false, false, 44,
29, 128, 107, 106, 127, false, false, 116, 54, 76, 107, 106, 127, false,
false, 123, false, false, false, false, false, false, false}

        slot = 0x7f6a6d507b78

        equals = true

        old_isnull = {false, false, false, 109, 106, 127, false, false, 165,
199, 80, 109, 106, 127, false, false, 160, 199, 80, 109, 106, 127, false,
false, 168, 199, 80, 109, 106, 127, false, false}

        new_values = {7, 4, 140095077240522, 16, 140722927572064,
140095077271456, 140095077272720, 140722927572079, 0, 0, 140722927572048,
140095054460096, 140095077060032, 140095077242520, 5, 4, 140722927572096,
140095046814123, 

          140722927572096, 140095046825162, 140095043410520,
140095077060880, 140722927572224, 140095046865098, 60129542543, 23274985272,
140095043410520, 140095077060032, 140095043409720, 140095077060304,
140095077237568, 

          140095043414504}

        indexno = 1

        l = 0x7f6a6d50d180

        estate = 0x7f6a6d4c4e70

        old_values = {7, 5, 140078657614882, 1, 140095077060032, 1024,
140095077271456, 0, 140722927571760, 140095049517261, 140722927571808,
140722927571776, 140722927571792, 140095046814329, 18, 140095077271464,
140095076986224, 

          140095077237568, 22, 100, 140722927571856, 140722927571920,
140095046831813, 140722927571856, 140095046842624, 0, 140095077265656, 3,
140095077271456, 140095077271792, 140095077271456, 140095077242520}

#3  heap_update (relation=relation(at)entry=0x7f6a6b4c1738, otid=otid(at)entry=0x7
ffc9c1c1690, newtup=newtup(at)entry=0x7f6a6d504e80, cid=0, crosscheck=<
optimized out>, wait=wait(at)entry=true, hufd=hufd(at)entry=0x7ffc9c1c15b0, 

    lockmode=lockmode(at)entry=0x7ffc9c1c15a4) at /build/postgresql-11-oDwOcQ/
postgresql-11-11.0/build/../src/backend/access/heap/heapam.c:4230

        result = <optimized out>

        xid = 245028971

        hot_attrs = 0x7f6a6d50d0a0

        proj_idx_attrs = 0x7f6a6d50d0b8

        key_attrs = 0x7f6a6d50d0d0

        id_attrs = 0x7f6a6d50d0e8

        interesting_attrs = 0x7f6a6d50d100

        lp = <optimized out>

        oldtup = {t_len = 81, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 9},
ip_posid = 71}, t_tableOid = 65471913, t_data = 0x7f669aa0b3f0}

        heaptup = 0x7f6a6d504e80

        old_key_tuple = 0x0

        old_key_copied = false

        page = 0x7f669aa0b180 "=="

        block = <optimized out>

        mxact_status = <optimized out>

        buffer = <optimized out>

        newbuf = 17858

        vmbuffer = 0

        vmbuffer_new = 0

        need_toast = <optimized out>

        newtupsize = <optimized out>

        pagefree = <optimized out>

        have_tuple_lock = false

        iscombo = false

        use_hot_update = false

        hot_attrs_checked = <optimized out>

        key_intact = <optimized out>

        all_visible_cleared = false

        all_visible_cleared_new = false

        checked_lockers = <optimized out>

        locker_remains = <optimized out>

        xmax_new_tuple = <optimized out>

        xmax_old_tuple = 245028971

        infomask_old_tuple = 0

        infomask2_old_tuple = 8192

        infomask_new_tuple = 144

        infomask2_new_tuple = 0

        __func__ = "heap_update"

#4  0x00007f6a6b82f2c9 in ExecUpdate (mtstate=mtstate(at)entry=0x7f6a6d4da3f0,
tupleid=0x7ffc9c1c1690, oldtuple=0x0, slot=0x7f6a6d504340, planSlot=
planSlot(at)entry=0x7f6a6d5685e8, epqstate=epqstate(at)entry=0x7f6a6d4da4b0,

    estate=estate(at)entry=0x7f6a6d4d8ed0, canSetTag=true) at /build/postgresql
-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/executor/nodeModifyTable.
c:1209

        lockmode = LockTupleNoKeyExclusive

        partition_constraint_failed = false

        tuple = 0x7f6a6d504e80

        resultRelInfo = 0x7f6a6d4d9110

        resultRelationDesc = 0x7f6a6b4c1738

        result = <optimized out>

        hufd = {ctid = {ip_blkid = {bi_hi = 5616, bi_lo = 39964}, ip_posid =
32764}, xmax = 1803594893, cmax = 32618}

        recheckIndexes = 0x0

        saved_tcs_map = 0x0

        __func__ = "ExecUpdate"

#5  0x00007f6a6b82f978 in ExecModifyTable (pstate=0x7f6a6d4da3f0) at /build/
postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/executor/
nodeModifyTable.c:2166

        node = 0x7f6a6d4da3f0

        proute = 0x0

        estate = 0x7f6a6d4d8ed0

        operation = CMD_UPDATE

        saved_resultRelInfo = 0x0

        resultRelInfo = 0x7f6a6d4d9110

        subplanstate = 0x7f6a6d5679f8

        junkfilter = 0x7f6a6d504218

        slot = <optimized out>

        tupleid = <optimized out>

        tuple_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 9}, ip_posid = 71}

        oldtupdata = {t_len = 2619086544, t_self = {ip_blkid = {bi_hi =
32764, bi_lo = 0}, ip_posid = 9204}, t_tableOid = 32618, t_data = 0x7ffc9c1c
16d0}

        oldtuple = <optimized out>

        __func__ = "ExecModifyTable"

#6  0x00007f6a6b80b25b in ExecProcNode (node=0x7f6a6d4da3f0) at /build/
postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/include/executor/
executor.h:237

No locals.

#7  ExecutePlan (execute_once=<optimized out>, dest=0x7f6a6d5ed298,
direction=<optimized out>, numberTuples=0, sendTuples=false, operation=CMD_
UPDATE, use_parallel_mode=<optimized out>, planstate=0x7f6a6d4da3f0, estate=
0x7f6a6d4d8ed0)

    at /build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/
executor/execMain.c:1723

        slot = <optimized out>

        current_tuple_count = 0

#8  standard_ExecutorRun (queryDesc=0x7f6a6d5f5b80, direction=<optimized out
>, count=0, execute_once=<optimized out>) at /build/postgresql-11-oDwOcQ/
postgresql-11-11.0/build/../src/backend/executor/execMain.c:364

        estate = 0x7f6a6d4d8ed0

        operation = CMD_UPDATE

        dest = 0x7f6a6d5ed298

        sendTuples = <optimized out>

        __func__ = "standard_ExecutorRun"

#9  0x00007f6a6b95b41d in ProcessQuery (plan=<optimized out>, 

    sourceText=0x7f6a6d3b45a0 "UPDATE public.schedulecard SET ext_ident=
NULL, finalreport_absence_cnt=NULL, period_day=4, period_num=1, rotates=
false, rotationgroup_id=NULL, schedulecard_id=NULL, schedulecardblock_id=
NULL, schedule"..., 

    params=0x0, queryEnv=0x0, dest=0x7f6a6d5ed298, completionTag=0x7ffc9c1c
19d0 "") at /build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/
backend/tcop/pquery.c:161

        queryDesc = 0x7f6a6d5f5b80

#10 0x00007f6a6b95b668 in PortalRunMulti (portal=portal(at)entry=0x7f6a6d47f
230, isTopLevel=isTopLevel(at)entry=true, setHoldSnapshot=setHoldSnapshot(at)entry
=false, dest=dest(at)entry=0x7f6a6d5ed298, altdest=altdest(at)entry=0x7f6a6d5ed
298, 

    completionTag=completionTag(at)entry=0x7ffc9c1c19d0 "") at /build/
postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/tcop/pquery.c:
1286

        pstmt = 0x7f6a6d5ed138

        active_snapshot_set = true

        stmtlist_item = 0x7f6a6d5ed248

#11 0x00007f6a6b95c2d5 in PortalRun (portal=portal(at)entry=0x7f6a6d47f230,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true, run
_once=run_once(at)entry=true, dest=dest(at)entry=0x7f6a6d5ed298, 

    altdest=altdest(at)entry=0x7f6a6d5ed298, completionTag=completionTag(at)entry=
0x7ffc9c1c19d0 "") at /build/postgresql-11-oDwOcQ/postgresql-11-11.0/
build/../src/backend/tcop/pquery.c:799

        save_exception_stack = 0x7ffc9c1c1c00

        save_context_stack = 0x0

        local_sigjmp_buf = {{__jmpbuf = {140722927574016,
7153736072036697610, 140095075869696, 140722927573456, 140095076692528,
140095075869744, 7153736071992657418, 7091693995611473418}, __mask_was_saved
= 0, __saved_mask = {__val = {

                0, 140723125634669, 0, 140095051070678, 64, 112,
140095075869696, 140095075861648, 140095078191720, 140722927573328,
140095049517496, 140722927573360, 2, 140095075869696, 2, 140722927573360}}}}

        result = <optimized out>

        nprocessed = <optimized out>

        saveTopTransactionResourceOwner = 0x7f6a6d4182c8

        saveTopTransactionContext = 0x7f6a6d490150

        saveActivePortal = 0x0

        saveResourceOwner = 0x7f6a6d4182c8

        savePortalContext = 0x0

        saveMemoryContext = 0x7f6a6d490150

        __func__ = "PortalRun"

#12 0x00007f6a6b957f7f in exec_simple_query (

    query_string=0x7f6a6d3b45a0 "UPDATE public.schedulecard SET ext_ident=
NULL, finalreport_absence_cnt=NULL, period_day=4, period_num=1, rotates=
false, rotationgroup_id=NULL, schedulecard_id=NULL, schedulecardblock_id=
NULL, schedule"...) at /build/postgresql-11-oDwOcQ/postgresql-11-11.0/
build/../src/backend/tcop/postgres.c:1122

        parsetree = 0x7f6a6d3b6400

        portal = 0x7f6a6d47f230

        snapshot_set = <optimized out>

        commandTag = <optimized out>

        completionTag = "\000\032\034\234\374\177\000\000\341\356\247kj\177\
000\000\000\000\000\000\000\000\000\000\060\316\367kj\177\000\000\340&\360
kj\177\000\000\234U\357kj\177\000\000\240E;mj\177\000\000Q\000\000\000\000\
000\000"

        querytree_list = <optimized out>

        plantree_list = <optimized out>

        receiver = 0x7f6a6d5ed298

        format = 0

        dest = DestRemote

        parsetree_list = 0x7f6a6d3b6450

        parsetree_item = 0x7f6a6d3b6430

        save_log_statement_stats = false

        was_logged = false

        use_implicit_block = false

        msec_str = "\000\032\034\234\374\177\000\000\341\356\247kj\177\000\
000\000\000\000\000\000\000\000\000\060\316\367kj\177\000"

        __func__ = "exec_simple_query"

#13 0x00007f6a6b95930a in PostgresMain (argc=<optimized out>, argv=argv@
entry=0x7f6a6d400220, dbname=0x7f6a6d4001b0 "mydb", username=<optimized out>
)

    at /build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/
tcop/postgres.c:4159

        query_string = 0x7f6a6d3b45a0 "UPDATE public.schedulecard SET ext_
ident=NULL, finalreport_absence_cnt=NULL, period_day=4, period_num=1,
rotates=false, rotationgroup_id=NULL, schedulecard_id=NULL,
schedulecardblock_id=NULL, schedule"...

        firstchar = 81

        input_message = {

          data = 0x7f6a6d3b45a0 "UPDATE public.schedulecard SET ext_ident=
NULL, finalreport_absence_cnt=NULL, period_day=4, period_num=1, rotates=
false, rotationgroup_id=NULL, schedulecard_id=NULL, schedulecardblock_id=
NULL, schedule"..., len = 319, maxlen = 1024, cursor = 319}

        local_sigjmp_buf = {{__jmpbuf = {140095054460088,
7153736071902479882, 140095054460096, 1, 0, 140095076133088,
7153736071933937162, 7091693993451144714}, __mask_was_saved = 1, __saved_
mask = {__val = {0, 140095076133088, 

                140095004765576, 844424930131970, 206158430256,
140722927574360, 140722927574144, 293, 140095076133088, 0, 0,
140095075861648, 1024, 140722927574468, 0, 140722927574240}}}}

        send_ready_for_query = false

        disable_idle_in_transaction_timeout = false

        __func__ = "PostgresMain"

#14 0x00007f6a6b67867d in BackendRun (port=0x7f6a6d3f68e0) at /build/
postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/postmaster/
postmaster.c:4361

        ac = 1

        secs = 594476990

        usecs = 933865

        i = 1

        av = 0x7f6a6d400220

        maxac = <optimized out>

#15 BackendStartup (port=0x7f6a6d3f68e0) at /build/postgresql-11-oDwOcQ/
postgresql-11-11.0/build/../src/backend/postmaster/postmaster.c:4033

        bn = <optimized out>

        pid = <optimized out>

#16 ServerLoop () at /build/postgresql-11-oDwOcQ/postgresql-11-11.0/
build/../src/backend/postmaster/postmaster.c:1706

        rmask = {fds_bits = {8, 0 <repeats 15 times>}}

        selres = <optimized out>

        now = <optimized out>

        readmask = {fds_bits = {200, 0 <repeats 15 times>}}

        last_lockfile_recheck_time = 1541161751

        last_touch_time = 1541161147

        __func__ = "ServerLoop"

#17 0x00007f6a6b8e7644 in PostmasterMain (argc=5, argv=<optimized out>) at /
build/postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/
postmaster/postmaster.c:1379

        opt = <optimized out>

        status = <optimized out>

        userDoption = <optimized out>

        listen_addr_saved = true

        i = <optimized out>

        output_config_variable = <optimized out>

        __func__ = "PostmasterMain"

#18 0x00007f6a6b679616 in main (argc=5, argv=0x7f6a6d3af1f0) at /build/
postgresql-11-oDwOcQ/postgresql-11-11.0/build/../src/backend/main/main.c:228

No locals.

Best regards, 
Ondřej Bouda

---------- Původní e-mail ----------
Od: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Komu: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
Datum: 3. 11. 2018 4:57:19
Předmět: Re: backend crash on DELETE, reproducible locally
"Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net> writes:
> On Thu, Nov 01, 2018 at 11:43:56AM -0400, Tom Lane wrote:
>> Yeah, apparently we've passed a null OLD tuple to an RI_FKey_cascade_del
>> trigger, which surely shouldn't happen. It'd be interesting to look at
>> the set of triggers on this table. I don't entirely trust psql's \d
>> to show us reality if there's something screwed up about the triggers,
>> so in addition to \d output, could we see
>> select * from pg_trigger where tgrelid = 'ref.auto_hint'::regclass;

> [ pretty normal-looking trigger entries ]

I was feeling baffled about this, but it suddenly occurs to me that maybe
the bug fixed in 040a1df61/372102b81 explains this. If the particular
tuple you're trying to delete predates the last ALTER TABLE ADD COLUMN
operation on the table, then this bug would result in t_self getting
set to "invalid", which'd cause AfterTriggerSaveEvent to save "ate_ctid1"
as "invalid", which'd lead to a null tuple getting passed when the trigger
eventually gets invoked.

regards, tom lane

"

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-11-03 19:38:14 Re: backend crash on DELETE, reproducible locally
Previous Message Karsten Hilbert 2018-11-03 16:03:14 Re: backend crash on DELETE, reproducible locally

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-11-03 19:24:28 Re: partitioned indexes and tablespaces
Previous Message Andres Freund 2018-11-03 18:42:59 Re: plruby: rb_iterate symbol clash with libruby.so