Re: Progress on fast path sorting, btree index creation time

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, "Jim Decibel! Nasby" <decibel(at)decibel(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-02-08 18:48:53
Message-ID: CAEYLb_WH34D+Ykk7Zy+Jr-3r-Z0-VdkmS-FT93DO1o22XrgmsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 February 2012 17:58, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> It seems clear that the single sort-key optimizations are a much
> better value per byte of code than the type-specific optimizations.
> Ignoring client overhead, we get between half and two-thirds of the
> benefit, and it costs us just one extra copy of the quicksort logic
> for all data types.

That was clear from an early stage, and is something that I
acknowledged way back in September - it's pretty obvious that chasing
diminishing returns is the nature of this sort of thing, which is
fine, provided that they are not chased beyond some point that doesn't
make sense. However, I do not see why we should not at least include
full integer specialisations as well (for single-keys at least), given
that we get the additional, not inconsiderable improvements. I do not
accept the premise that we need to find the optimal bytes added to
performance increase ratio, because, as I've already stressed, adding
bytes to the application binary does not have a linear cost.

Here's what might be a good compromise:

singlekey generic, singlekey int4, singlekey int8, multikey generic.

That is a total number of 4 contiguous copies of the qsort, because
we've obsoleted the original qsort_arg (accept for btree index
sorting, which likely matters far less). We get the most benefit for 5
very common types - int4, int8, date, timestamp and timestamptz. A 30%
improvement has to be better than the 19% speedup for just the single
sort-key optimisations, considering that in practice these types are
very commonly used.

I think that there may be additional benefits from making the
qsort_arg specialisation look less like a c stdlib one, like refining
the swap logic to have compile-time knowledge of the type it is
sorting. I'm thinking that we could usefully trim quite a bit from
this:

#define SWAPINIT(a, es) swaptype = ((char *)(a) - (char *)0) % sizeof(long) || \
(es) % sizeof(long) ? 2 : (es) == sizeof(long)? 0 : 1;

#define thistype_vecswap(a, b, n) \
if ((n) > 0) inl_swapfunc((a), (b), (size_t)(n), swaptype)

#define thistype_swap(a, b) \
if (swaptype == 0) { \
long t = *(long *)(void *)(a); \
*(long *)(void *)(a) = *(long *)(void *)(b); \
*(long *)(void *)(b) = t; \
} else \
inl_swapfunc(a, b, es, swaptype)

inline static void
inl_swapfunc(char *a, char *b, size_t n, int swaptype)
{
if (swaptype <= 1)
swapcode(long, a, b, n);
else
swapcode(char, a, b, n);
}

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-02-08 19:15:09 Re: Bugs/slowness inserting and indexing cubes
Previous Message Fujii Masao 2012-02-08 18:39:41 pg_receivexlog and sync rep Re: Updated version of pg_receivexlog