Re: Using quicksort for every external sort run

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Using quicksort for every external sort run
Date: 2015-11-29 10:01:46
Message-ID: CAM3SWZQrT0OH-F=0X=SAO=jKxHD1KLqS_Kch4bR6CkdJKChbXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 28, 2015 at 4:05 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> So there was 27.76 seconds spent copying tuples into local memory
> ahead of the quicksort, 2 minutes 56.68 seconds spent actually
> quicksorting, and a trifling 10.32 seconds actually writing the run! I
> bet that the quicksort really didn't use up too much memory bandwidth
> on the system as a whole, since abbreviated keys are used with a cache
> oblivious internal sorting algorithm.

Uh, actually, that isn't so:

LOG: begin index sort: unique = f, workMem = 1048576, randomAccess = f
LOG: bttext_abbrev: abbrev_distinct after 160: 1.000489
(key_distinct: 40.802210, norm_abbrev_card: 0.006253, prop_card:
0.200000)
LOG: bttext_abbrev: aborted abbreviation at 160 (abbrev_distinct:
1.000489, key_distinct: 40.802210, prop_card: 0.200000)

Abbreviation is aborted in all cases that you tested. Arguably this
should happen significantly less frequently with the "C" locale,
possibly almost never, but it makes this case less than representative
of most people's workloads. I think that at least the first several
hundred leading attribute tuples are duplicates.

BTW, roughly what does this CREATE INDEX look like? Is it a composite
index, for example?

It would also be nice to see pg_stats entries for each column being
indexed. Data distributions are certainly of interest here.

Thanks
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-11-29 10:09:46 Re: Using quicksort for every external sort run
Previous Message Pavel Stehule 2015-11-29 08:18:58 Re: proposal: PL/Pythonu - function ereport