Re: Composite Datums containing toasted fields are a bad idea(?)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Composite Datums containing toasted fields are a bad idea(?)
Date: 2014-04-24 23:40:30
Message-ID: 25016.1398382830@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I'm actually planning to set this patch on the shelf for a bit and go
> investigate the other alternative, ie, not generating composite Datums
> containing toast pointers in the first place.

Here's a draft patch along those lines. It turned out to be best to
leave heap_form_tuple() alone and instead put the dirty work into
HeapTupleGetDatum(). That has some consequences worth discussing:

* The patch changes HeapTupleGetDatum from a simple inline macro into
a function call. This means that third-party extensions will not get
protection against creation of toast-pointer-containing composite Datums
until they recompile. I'm not sure that this is a big deal, though.
After looking through the existing core and contrib code, it seems that
nearly all users of HeapTupleGetDatum are building their tuples from
locally-sourced data that could not possibly be toasted anyway. (This is
true a-priori for users of BuildTupleFromCStrings, for example, and seems
to be true of all uses of HeapTupleGetDatum in SQL-callable functions.)
So it's entirely possible that there would be no live bug anywhere even
without recompiles.

* If we were sufficiently worried about the previous point, we could
get some protection against unfixed extension code by not removing
toast_flatten_tuple_attribute() in the back branches, only in HEAD.
I'm doubtful that it's worth the cycles though.

* Because HeapTupleGetDatum might allocate a new tuple, the wrong thing
might happen if the caller changes CurrentMemoryContext between
heap_form_tuple and HeapTupleGetDatum. I could only find two places
that did that, though, both in plpgsql. I thought about having
HeapTupleGetDatum try to identify the context the passed tuple had been
allocated in, but the problem with that is the passed tuple isn't
necessarily heap-allocated at all --- in fact, one of the two problematic
places in plpgsql passes a pointer to a stack-local variable, so we'd
actually crash if we tried to apply GetMemoryChunkContext() to it.
Of course we can (and I did) change plpgsql, but the question is whether
any third-party code has copied either coding pattern. On balance it
seems best to just use palloc; that's unlikely to cause anything worse
than a memory leak.

* I was quite pleased with the size of the patch: under 100 net new lines,
half of that being a new regression test case. And it's worth emphasizing
that this is a *complete* fix, modulo the question of third-party code
recompiles. The patch I showed a few days ago added several times that
much just to fix arrays of composites, and I wasn't too confident that it
fixed every case even for arrays.

* The cases where an "extra" detoast would be incurred are pretty narrow:
basically, whole-row-Var references, ROW() constructs, and the outputs of
functions returning tuples. As discussed earlier, whether this would cost
anything or save anything would depend on the number of subsequent uses of
the composite Datum's previously-toasted fields. But I'm thinking that
the amount of application code that would actually be impacted in either
direction is probably pretty small. Moreover, we aren't adding any
noticeable overhead in cases where no detoasting occurs, unlike the
situation with the previous patch. (In fact, we're saving some overhead
by getting rid of syscache lookups in toast_flatten_tuple_attribute.)

So, despite Noah's misgivings, I'm thinking this is the way to proceed.

Comments?

regards, tom lane

Attachment Content-Type Size
no-toast-pointers-in-composite-datums-1.patch text/x-diff 33.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2014-04-25 00:23:04 Re: UUIDs in core WAS: 9.4 Proposal: Initdb creates a single table
Previous Message Peter Geoghegan 2014-04-24 22:26:17 Re: Clock sweep not caching enough B-Tree leaf pages?