Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-12-07 00:04:05
Message-ID: CAM3SWZQHJS=Bg9esOZo7V3nXTjEiAMyjVmjJyk-feLz13m6b8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Dec 6, 2015 at 3:59 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> My column has the format of ABC-123-456-789-0
>
> The name-space identifier ("ABC-") is the same in 99.99% of the
> cases. And to date, as well as in my extrapolation, the first two
> digits of the numeric part are leading zeros and the third one is
> mostly 0,1,2. So the first 8 bytes really have less than 2 bits worth
> of information. So yeah, not surprising abbreviation was not useful.

I think that given you're using the "C" collation, abbreviation should
still go ahead. I posted a patch to do that, which I need to further
justify per Robert's request (currently, we do nothing special based
on collation). Abbreviation should help in surprisingly marginal
cases, since far fewer memory accesses will be required in the early
stages of the sort with only (say) 5 distinct abbreviated keys. Once
abbreviated comparisons start to not help at all (with quicksort, at
some partition), there's a good chance that the full keys can be
reused to some extent, before being evicted from CPU caches.

>> BTW, roughly what does this CREATE INDEX look like? Is it a composite
>> index, for example?
>
> Nope, just a single column index. In the extrapolated data set, each
> distinct value shows up a couple hundred times on average. I'm
> thinking of converting it to a btree_gin index once I've tested them a
> bit more, as the compression benefits are substantial.

Unfortunately, that cannot use tuplesort.c at all.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-12-07 00:25:23 Re: Using quicksort for every external sort run
Previous Message Jeff Janes 2015-12-06 23:59:25 Re: Using quicksort for every external sort run