Re: PostgreSQL crashes with SIGSEGV

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andreas Seltenreich <andreas(dot)seltenreich(at)credativ(dot)de>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: PostgreSQL crashes with SIGSEGV
Date: 2017-12-17 01:40:24
Message-ID: CAH2-WzkAWqMQCOoT7xV_otN77FGa8psK8rVM_gu9Q=R7XiQVtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Dec 14, 2017 at 5:58 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The ambiguity about who owns the tuple memory when is the basic
> problem, once again. One could argue that it's the caller's fault for
> not knowing to not pfree() the tuple after tuplesort_end() is called,
> but I don't buy that because it seems like an undue burden on callers
> to do that. It seems okay to either provide a very weak, simple
> guarantee about tuple memory lifetime, or very strong simple guarantee
> about tuple memory lifetime. That's what routines like
> tuplesort_gettupleslot() and tuplesort_getindextuple() should limit
> their contracts to -- we've had enough problems in this area over the
> years that that seems pretty clear to me.
>
> ISTM that an appropriate fix is one that results in having
> tuplesort_gettupleslot() tuple memory that is *consistently* allocated
> within caller's own memory context, at least on versions prior to
> Postgres 10 (it's a bit more complicated in Postgres 10).

This took longer than expected. Attached patch, which targets 9.6,
shows what I have in mind. I'm going on vacation shortly, but wanted
to post this patch before I leave. It could definitely use some more
testing, since it was written under time pressure. I expect to be back
early in the new year, so if someone feels like taking this off my
hands, they're more than welcome to do so.

Postgres v10 isn't affected because there is no such thing as
should_free from commit f1f5ec1ef onwards (commit fa117ee40
reintroduced a no-copy optimization a while later, which looks like it
made things unsafe again, but it's actually fine -- the
ExecClearTuple() pfree() won't happen, so no crash). Everything is
fine on v10 because the design was improved for v10. On v10,
tuplesort_gettupleslot() calls ExecStoreMinimalTuple() with arguments
that indicate that the memory still belongs to the tuplesort, unless
it was an explicit copy made with heap_copy_minimal_tuple() there and
then, in which case it belong to caller. This works because on v10
we're always managing some temp buffers, regardless of state->status.
There is no question of a routine like readtup_alloc() allocating or
using memory without knowing about its lifetime, because it's always
using the smallish slab allocator buffers.

To put in another way, in v10 everything is fine because callers to
routines like tuplesort_gettupleslot() either have very limited
guarantees about fetched tuple memory that is owned by tuplesort (it
can go away almost immediately, on the next call to
tuplesort_gettupleslot()), or infinitely flexible guarantees (it's
unambiguously caller's copy, and lives in caller's memory context).
There is no awkward mix of these two situations, as is the case in
earlier versions. That awkward mix is: "You (caller) should free this
memory, and can rely on this memory sticking around until
tuplesort_end() is called, at which point I (tuplesort) will free it
if you haven't -- though I might *not* free it then, based on whether
or not I felt like using your memory context!". This is an accident
waiting to happen. Remember, the crash isn't a use-after-free; it's a
double-free caused by conflated responsibilities across two low-level,
refcount-like mechanisms. The idea of a tts_shouldFree=true tuple
whose memory exists in a context that isn't under the executor's
direct control is a fundamentally bogus idea. It kinda works most of
the time, but only to the extent that the lifetime of a tuplesort is
accidentally tied to the lifetime of an executor node, without the
tuple/slot making in into something like estate->es_tupleTable (as was
the case in Andreas' example).

An alternative approach to fixing this issue, that I also want to
throw out there, is perform a simple change within
ExecResetTupleTable() to make ExecClearTuple()'d slots force
"tts_shouldFree = false" -- a second patch shows what I mean here. All
ExecResetTupleTable() callers pass "shouldFree = false" anyway,
because all memory is about to go away (within FreeExecutorState()).
This second approach seems like a real kludge to me, but it's also a
fix that requires a tiny tweak, rather than non-trivial tuplesort.c
memory management changes. I would feel worse about the idea of
applying this kludge if we had to rely on it forever, but since v10 is
unaffected anyway, we really don't. But it's still an awful kludge. I
find it hard to know what to do here (the vacation is overdue, it
seems).

(Note that v10 does have a minor, related bug, noted in commit
messages of both patches.)

--
Peter Geoghegan

Attachment Content-Type Size
WIP-kludge-fix.patch text/x-patch 6.3 KB
WIP-tuplesort-memcontext-fix.patch text/x-patch 23.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2017-12-17 10:46:17 Re: PostgreSQL crashes with SIGSEGV
Previous Message Amit Kapila 2017-12-16 07:26:24 Re: 10.1: hash index size exploding on vacuum full analyze

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-17 01:48:32 Re: pgsql: Provide overflow safe integer math inline functions.
Previous Message Andres Freund 2017-12-17 01:35:05 Re: pgsql: Provide overflow safe integer math inline functions.