Re: Add proper planner support for ORDER BY / DISTINCT aggregates

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pavel Luzanov <p(dot)luzanov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Richard Guo <guofenglinux(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Date: 2022-11-08 01:31:12
Message-ID: CAApHDvpkbvDEDJPNZrUqCsCLVC57ZY8qrPn8nGpQSrn4RYstZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 8 Nov 2022 at 03:53, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> wrote:
> I can't see why an incrementalsort could be more expensive than a full sort,
> using the same presorted path. It looks to me that in that case we should
> always prefer an incrementalsort. Maybe we should bound incremental sorts cost
> to make sure they are never more expensive than the full sort ?

The only thing that I could think of that would cause incremental sort
to be more expensive than sort is the tuplesort_reset() calls that are
performed between sorts. However, I see cost_incremental_sort()
accounts for those already with:

run_cost += 2.0 * cpu_tuple_cost * input_groups;

Also, I see at the top of incremental_sort.sql there's a comment claiming:

-- When we have to sort the entire table, incremental sort will
-- be slower than plain sort, so it should not be used.

I'm just unable to verify that's true by doing the following:

$ echo "select * from (select * from tenk1 order by four) t order by
four, ten;" > bench.sql

$ pgbench -n -f bench.sql -T 60 -M prepared regression | grep -E "^tps"
tps = 102.136151 (without initial connection time)

$ # disable sort so that the test performs Sort -> Incremental Sort rather
$ # than Sort -> Sort
$ psql -c "alter system set enable_sort=0;" regression
$ psql -c "select pg_reload_conf();" regression

$ pgbench -n -f bench.sql -T 60 -M prepared regression | grep -E "^tps"
tps = 112.378761 (without initial connection time)

When I disable sort, the plan changes to use Incremental Sort and
execution becomes faster, not slower like the comment claims it will.
Perhaps this was true during the development of Incremental sort and
then something was changed to speed things up. I do recall reviewing
that patch many years ago and hinting about the invention of
tuplesort_reset(). I don't recall, but I assume the patch must have
been creating a new tuplesort each group before that.

Also, I was looking at add_paths_to_grouping_rel() and I saw that if
presorted_keys > 0 that we'll create both a Sort and Incremental Sort
path. If we assume Incremental Sort is always better when it can be
done, then it seems useless to create the Sort path when Incremental
Sort is possible. When working on making Incremental Sorts work for
window functions I did things that way. Maybe we should just make
add_paths_to_grouping_rel() work the same way.

Regarding the 1.5 factor in cost_incremental_sort(), I assume this is
for skewed groups. Imagine there's 1 huge group and 99 tiny ones.
However, even if that were the case, I imagine the performance would
still be around the same performance as the non-incremental variant of
sort.

I've been playing around with the attached patch which does:

1. Adjusts add_paths_to_grouping_rel so that we don't add a Sort path
when we can add an Incremental Sort path instead. This removes quite a
few redundant lines of code.
2. Removes the * 1.5 fuzz-factor in cost_incremental_sort()
3. Does various other code tidy stuff in cost_incremental_sort().
4. Removes the test from incremental_sort.sql that was ensuring the
inferior Sort -> Sort plan was being used instead of the superior Sort
-> Incremental Sort plan.

I'm not really that 100% confident in the removal of the * 1.5 thing.
I wonder if there's some reason we're not considering that might cause
a performance regression if we're to remove it.

David

Attachment Content-Type Size
adjust_incremental_sort_costings.patch text/plain 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-11-08 01:57:35 Re: Add palloc_aligned() to allow arbitrary power of 2 memory alignment
Previous Message Michael Paquier 2022-11-08 01:04:16 Re: Allow file inclusion in pg_hba and pg_ident files