From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | matt(dot)jibson(at)gmail(dot)com |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: BUG #16213: segfault when running a query |
Date: | 2020-01-17 04:03:46 |
Message-ID: | 635.1579233826@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> The above query produces an error in the server log:
> LOG: server process (PID 108) was terminated by signal 11: Segmentation
> fault
Yeah, duplicated here. (For anyone following along at home: you
can "create domain string as text", or just s/STRING/TEXT/g in the
given query. That type's not relevant to the problem.) The problem
is probably much more easily reproduced in a debug build, because it
boils down to a dangling-pointer bug. I duplicated it back to 9.4,
and it's probably older than that.
The direct cause of the crash is that by the time we get to ExecutorEnd,
there are dangling pointers in the es_tupleTable list. Those pointers
turn out to originate from ExecInitSubPlan's creation of TupleTableSlots
for the ProjectionInfo objects it creates when doing hashing. And the
reason they're dangling is that the subplan is inside a VALUES list,
and nodeValuesscan.c does this remarkably bletcherous thing:
* Build the expression eval state in the econtext's per-tuple memory.
* This is a tad unusual, but we want to delete the eval state again
* when we move to the next row, to avoid growth of memory
* requirements over a long values list.
It turns out that just below that, there's already some messy hacking
to deal with subplans in the VALUES list. But I guess we'd not hit
the case of a subplan using hashing within VALUES.
The attached draft patch fixes this by not letting ExecInitSubPlan hook
the slots it's making into the outer es_tupleTable list. Ordinarily
that would be bad because it exposes us to leaking resources, if the
slots aren't cleared before ending execution. But nodeSubplan.c is
already being (mostly) careful to clear those slots promptly, so it
doesn't cost us anything much to lose this backstop.
What that change fails to do is guarantee that there are no such
bugs elsewhere. In the attached I made nodeValuesscan.c assert that
nothing has added slots to the es_tupleTable list, but of course
that only catches cases where there's a live bug. Given how long
this case escaped detection, I don't feel like that's quite enough.
(Unsurprisingly, the existing regression tests don't trigger this
assert, even without the nodeSubplan.c fix.)
Another angle I've not run to ground is that the comments for the
existing subplan-related hacking in nodeValuesscan.c claim that
a problematic subplan could only occur in conjunction with LATERAL.
But there's no LATERAL in this example --- are those comments wrong
or obsolete, or just talking about a different case?
I didn't work on making a minimal test case for the regression tests,
either.
Anyway, thanks for the report!
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fix-for-hashed-subplan-in-VALUES-1.patch | text/x-diff | 5.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2020-01-17 04:23:56 | BUG #16214: Settings is not copy |
Previous Message | Dilip Kumar | 2020-01-17 03:13:28 | Re: Reorderbuffer crash during recovery |
From | Date | Subject | |
---|---|---|---|
Next Message | Asim R P | 2020-01-17 04:04:05 | Unnecessary delay in streaming replication due to replay lag |
Previous Message | Dilip Kumar | 2020-01-17 03:13:28 | Re: Reorderbuffer crash during recovery |