Re: BUG #15703: Segfault in cancelled CALL-Statements

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: julian(dot)schauder(at)gmx(dot)de
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #15703: Segfault in cancelled CALL-Statements
Date: 2019-03-20 15:56:13
Message-ID: 8678.1553097373@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> there seems to be a SEGFAULT issue with CALL-Procedures once they get
> SIGINT'ed.

What I'm seeing in an assert-enabled build is an assertion failure:

TRAP: FailedAssertion("!(portal->resowner == ((void *)0))", File: "portalmem.c"\
, Line: 874)

#0 0x00007fbe9cd6beab in raise () from /lib64/libc.so.6
#1 0x00007fbe9cd565b9 in abort () from /lib64/libc.so.6
#2 0x000000000089d9cf in ExceptionalCondition (conditionName=conditionName(at)entry=0xac34b0 "!(portal->resowner == ((void *)0))",
errorType=errorType(at)entry=0x8ec7c9 "FailedAssertion", fileName=fileName(at)entry=0xac3163 "portalmem.c", lineNumber=lineNumber(at)entry=874)
at assert.c:54
#3 0x00000000008ca924 in AtCleanup_Portals () at portalmem.c:874
#4 0x00000000005145c7 in CleanupTransaction () at xact.c:2676
#5 0x00000000005186ca in AbortCurrentTransaction () at xact.c:3133
#6 0x0000000000789004 in PostgresMain (argc=1, argv=argv(at)entry=0x2bc3ad0, dbname=<optimized out>, username=0x2b96518 "tgl") at postgres.c:4033
#7 0x000000000070d91f in BackendRun (port=0x2bbce20, port=0x2bbce20) at postmaster.c:4399
#8 BackendStartup (port=0x2bbce20) at postmaster.c:4090
#9 ServerLoop () at postmaster.c:1703
#10 0x000000000070e81e in PostmasterMain (argc=argc(at)entry=2, argv=argv(at)entry=0x2b944c0) at postmaster.c:1376
#11 0x000000000047acae in main (argc=2, argv=0x2b944c0) at main.c:228

(gdb) f 3
#3 0x00000000008ca924 in AtCleanup_Portals () at portalmem.c:874
874 Assert(portal->resowner == NULL);
(gdb) p *portal
$1 = {name = 0x2bff4b8 "<unnamed portal 1>", prepStmtName = 0x0, portalContext = 0x2c728a0, resowner = 0x2c141d0, cleanup = 0x0, createSubid = 1,
activeSubid = 1, sourceText = 0x2c729b8 "SELECT GENERATE_SERIES(1,100000)", commandTag = 0xa10eb3 "SELECT", stmts = 0x2c74e30,
cplan = 0x2c75520, portalParams = 0x0, queryEnv = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 4, run_once = false,
status = PORTAL_FAILED, portalPinned = true, autoHeld = true, queryDesc = 0x0, tupDesc = 0x2c84118, formats = 0x0, holdStore = 0x2c83ee8,
holdContext = 0x2c83dd0, holdSnapshot = 0x0, atStart = false, atEnd = false, portalPos = 10, creation_time = 606411039547087, visible = true}

Poking around a bit, it seems like this is a thinko in HoldPinnedPortals:
it's not considering the possibility of a failure during HoldPortal().
In your test the failure is triggered by query cancel, but an ordinary
execution error while finishing out a cursor's calculations would be
a problem too.

I think it should not mark the portal as autoHeld until it's been
successfully held. The attached patch seems to make the problem
go away for me --- can you confirm it fixes your original issue?

I find HoldPinnedPortals suspicious in another way: is it really
OK for it to mark *all* pinned portals as auto-held? What if we're
in a nest of procedures run by different PLs (of which only the
innermost will have any idea we're committing)? Will the right
things happen if we auto-hold a portal or two and then get a failure
on another one?

regards, tom lane

Attachment Content-Type Size
auto-held-portal-fix.patch text/x-diff 503 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2019-03-20 16:07:47 Re: BUG #15703: Segfault in cancelled CALL-Statements
Previous Message David Rowley 2019-03-20 11:43:26 Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name