Quick Links

Re: GSOC 2018 Project - A New Sorting Routine

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Kefan Yang <starordust(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: GSOC 2018 Project - A New Sorting Routine
Date:	2018-07-13 22:10:18
Message-ID:	CAH2-Wzmj2XstMK58tJ0yEr+0MwpqMU8rfUB0j7GVe=p+yW5rTg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Jul 13, 2018 at 3:04 PM, Kefan Yang <starordust(at)gmail(dot)com> wrote:
> 1. Slow on CREATE INDEX cases.
>
> I am still trying to figure out where the bottleneck is. Is the data pattern
> in index creation very different from other cases? Also, pg_qsort has
> 10%-20% advantage at creating index even on sorted data (faster CPU, N =
> 1000000). This is very strange to me since the two sorting routines execute
> exactly the same code when the input data is sorted.

Yes. CREATE INDEX uses heap TID as a tie-breaker, so it's impossible
for any two index tuples to compare as equal within tuplesort.c, even
though they may be equal in other contexts. This is likely to defeat
things like the Bentley-McIlroy optimization where equal keys are
swapped, which is very effective in the event of many equal keys.

(Could also be parallelism, though I suppose you probably accounted for that.)

--
Peter Geoghegan

In response to

Fwd: GSOC 2018 Project - A New Sorting Routine at 2018-07-13 22:04:55 from Kefan Yang

Responses

Re: GSOC 2018 Project - A New Sorting Routine at 2018-07-14 21:20:41 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Daniel Gustafsson	2018-07-13 22:14:51	Re: Finding database for pg_upgrade missing library
Previous Message	Kefan Yang	2018-07-13 22:04:55	Fwd: GSOC 2018 Project - A New Sorting Routine