Re: Re: Abbreviated keys for Datum tuplesort

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Abbreviated keys for Datum tuplesort
Date: 2015-03-14 02:51:10
Message-ID: 87wq2knxf2.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Peter" == Peter Geoghegan <pg(at)heroku(dot)com> writes:

Peter> I attach a slightly tweaked version of Andrew's original.

You changed this:

static int
comparetup_datum(const SortTuple *a, const SortTuple *b, Tuplesortstate *state)
{
- int compare;
+ int32 compare;

compare = ApplySortComparator(a->datum1, a->isnull1,

Since ApplySortComparator returns int, and "compare" is used to store
the return value of comparetup_datum which is also declared int, this
seems inappropriate even as a "stylistic tweak".

Also, your changes to the block comment for SortTuple now hide the fact
that datum1 is potentially the abbreviated value in tuple as well as
single-Datum cases. Here are the versions for comparison (mine is
first):

***************
*** 136,175 ****
/*
* The objects we actually sort are SortTuple structs. These contain
* a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
* which is a separate palloc chunk --- we assume it is just one chunk and
* can be freed by a simple pfree(). SortTuples also contain the tuple's
* first key column in Datum/nullflag format, and an index integer.
*
* Storing the first key column lets us save heap_getattr or index_getattr
* calls during tuple comparisons. We could extract and save all the key
* columns not just the first, but this would increase code complexity and
* overhead, and wouldn't actually save any comparison cycles in the common
* case where the first key determines the comparison result. Note that
* for a pass-by-reference datatype, datum1 points into the "tuple" storage.
*
- * There is one special case: when the sort support infrastructure provides an
- * "abbreviated key" representation, where the key is (typically) a pass by
- * value proxy for a pass by reference type. In this case, the abbreviated key
- * is stored in datum1 in place of the actual first key column.
- *
* When sorting single Datums, the data value is represented directly by
! * datum1/isnull1 for pass by value types (or null values). If the datatype is
! * pass-by-reference and isnull1 is false, then "tuple" points to a separately
! * palloc'd data value, otherwise "tuple" is NULL. The value of datum1 is then
! * either the same pointer as "tuple", or is an abbreviated key value as
! * described above. Accordingly, "tuple" is always used in preference to
! * datum1 as the authoritative value for pass-by-reference cases.
*
* While building initial runs, tupindex holds the tuple's run number. During
* merge passes, we re-use it to hold the input tape number that each tuple in
* the heap was read from, or to hold the index of the next tuple pre-read
* from the same tape in the case of pre-read entries. tupindex goes unused
* if the sort occurs entirely in memory.
*/
typedef struct
{
void *tuple; /* the tuple proper */
Datum datum1; /* value of first key column */
bool isnull1; /* is first key column NULL? */
int tupindex; /* see notes above */
} SortTuple;
--- 136,170 ----
/*
* The objects we actually sort are SortTuple structs. These contain
* a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
* which is a separate palloc chunk --- we assume it is just one chunk and
* can be freed by a simple pfree(). SortTuples also contain the tuple's
* first key column in Datum/nullflag format, and an index integer.
*
* Storing the first key column lets us save heap_getattr or index_getattr
* calls during tuple comparisons. We could extract and save all the key
* columns not just the first, but this would increase code complexity and
* overhead, and wouldn't actually save any comparison cycles in the common
* case where the first key determines the comparison result. Note that
* for a pass-by-reference datatype, datum1 points into the "tuple" storage.
*
* When sorting single Datums, the data value is represented directly by
! * datum1/isnull1. If the datatype is pass-by-reference and isnull1 is false,
! * then datum1 points to a separately palloc'd data value that is also pointed
! * to by the "tuple" pointer; otherwise "tuple" is NULL. There is one special
! * case: when the sort support infrastructure provides an "abbreviated key"
! * representation, where the key is (typically) a pass by value proxy for a
! * pass by reference type.
*
* While building initial runs, tupindex holds the tuple's run number. During
* merge passes, we re-use it to hold the input tape number that each tuple in
* the heap was read from, or to hold the index of the next tuple pre-read
* from the same tape in the case of pre-read entries. tupindex goes unused
* if the sort occurs entirely in memory.
*/
typedef struct
{
void *tuple; /* the tuple proper */
Datum datum1; /* value of first key column */
bool isnull1; /* is first key column NULL? */
int tupindex; /* see notes above */
} SortTuple;

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2015-03-14 03:17:29 Re: Using 128-bit integers for sum, avg and statistics aggregates
Previous Message David Rowley 2015-03-14 01:51:49 Re: Performance improvement for joins where outer side is unique