Re: [HACKERS] [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Incremental sort
Date: 2018-03-10 15:42:04
Message-ID: CAPpHfduY0FWfwZQGpp4Kppc8epjC2WRUdijDSZJ7+mf9GhL3Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 8, 2018 at 2:49 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> OK, the revised patch works fine - I've done a lot of testing and
> benchmarking, and not a single segfault or any other crash.
>
> Regarding the benchmarks, I generally used queries of the form
>
> SELECT * FROM (SELECT * FROM t ORDER BY a) foo ORDER BY a,b
>
> with the first sort done in various ways:
>
> * regular Sort node
> * indexes with Index Scan
> * indexes with Index Only Scan
>
> and all these three options with and without LIMIT (the limit was set to
> 1% of the source table).
>
> I've also varied parallelism (max_parallel_workers_per_gather was set to
> either 0 or 2), work_mem (from 4MB to 256MB) and data set size (tables
> from 1000 rows to 10M rows).
>
> All of this may seem like an overkill, but I've found a couple of
> regressions thanks to that.
>
> The full scripts and results are available here:
>
> https://github.com/tvondra/incremental-sort-tests
>
> The queries actually executed are a bit more complicated, to eliminate
> overhead due to data transfer to client etc. The same approach was used
> in the other sorting benchmarks we've done in the past.
>
> I'm attaching results for two scales - 10k and 10M rows, preprocessed
> into .ods format. I haven't looked at the other scales yet, but I don't
> expect any surprises there.
>
> Each .ods file contains raw data for one of the tests (matching the .sh
> script filename), pivot table, and comparison of durations with and
> without the incremental sort.
>
> In general, I think the results look pretty impressive. Almost all the
> comparisons are green, which means "faster than master" - usually by
> tens of percent (without limit), or by up to ~95% (with LIMIT).
>
> There are a couple of regressions in two cases sort-indexes and
> sort-indexes-ios.
>
> Oh the small dataset this seems to be related to the number of groups
> (essentially, number of distinct values in a column). My assumption is
> that there is some additional overhead when "switching" between the
> groups, and with many groups it's significant enough to affect results
> on these tiny tables (where master only takes ~3ms to do the sort). The
> slowdown seems to be
>
> On the large data set it seems to be somehow related to both work_mem
> and number of groups, but I didn't have time to investigate that yet
> (there are explain analyze plans in the results, so feel free to look).
>
> In general, I think this looks really nice. It's certainly awesome with
> the LIMIT case, as it allows us to leverage indexes on a subset of the
> ORDER BY columns.
>

Thank you very much for testing and benchmarking. I'll investigate
the regressions you found.

> Now, there's a caveat in those tests - the data set is synthetic and
> perfectly random, i.e. all groups equally likely, no correlations or
> anything like that.
>
> I wonder what is the "worst case" scenario, i.e. how to construct a data
> set with particularly bad behavior of the incremental sort.

I think that depends on the reason of bad behavior of incremental sort.
For example, our quick sort implementation behaves very good on
presorted data. But, incremental sort appears to be not so good in
this case as Heikki showed upthread. That prompted me to test
presorted datasets (which appeared to be "worst case") more intensively.
But I suspect that regressions you found have another reason, and
correspondingly "worst case" would be also different.
When I'll investigate the reason of regressions, I'll try to construct
"worst case" as well.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message legrand legrand 2018-03-10 15:43:47 Re: [PROPOSAL] timestamp informations to pg_stat_statements
Previous Message Michael Banck 2018-03-10 15:09:31 Re: Online enabling of checksums