Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Date: 2015-12-09 22:15:02
Message-ID: CAM3SWZSxakL6Uynep+sMXOPKVM23BEvQ37kzap1rhetwHdRkfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 9, 2015 at 11:31 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I find the references to a "void" representation in this patch to be
> completely opaque. I see that there are some such references in
> tuplesort.c already, and most likely they were put there by commits
> that I did, so I guess I have nobody but myself to blame, but I don't
> know what this means, and I don't think we should let this terminology
> proliferate.
>
> My understanding is that the "void" representation is intended to
> whatever Datum we originally got, which might be a pointer. Why not
> just say that instead of referring to it this way?

That isn't what is intended. "void" is the state that macros like
index_getattr() leave NULL leading attributes (that go in the
SortTuple.datum1 field) in. However, the function tuplesort_putdatum()
requires callers to initialize their Datum to 0 now, which is new. A
"void" representation is a would-be NULL pointer in the case of
pass-by-value types, and a NULL pointer for pass-by-reference types.

> My understanding is also that it's OK if the abbreviated key stays the
> same even though the value has changed, but that the reverse could
> cause queries to return wrong answers. The first part of that
> justifies why this is safe when no abbreviation is available: we'll
> return an abbreviated value of 0 for everything, but that's fine.
> However, using the original Datum (which might be a pointer) seems
> unsafe, because two binary-identical values could be stored at
> different addresses and thus have different pointer representations.
>
> I'm probably missing something here, so clue me in...

I think that you're missing that patch 0001 formally forbids
abbreviated keys that are pass-by-value, by revising the contract
(this is proposed for backpatch to 9.5 -- only comments are changed).
This is already something that is all but forbidden, although the
datum case does tacitly acknowledge the possibility by not allowing
abbreviation to work with the pass-by-value-and-yet-abbreviated case.

I think that this revision is also useful for putting abbreviated keys
in indexes, something that may happen yet.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-12-09 22:18:55 Re: [sqlsmith] Failed to generate plan on lateral subqueries
Previous Message Stas Kelvich 2015-12-09 22:10:34 Re: Speedup twophase transactions