Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-07-09 12:11:25
Message-ID: 20190709121125.5bpaxgtd5wl7aghy@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 09, 2019 at 03:37:03AM +0200, Tomas Vondra wrote:
>
> ...
>
>Notice that cost of the second plan is almost double the first one. That
>means 0004 does not even generate the first plan, i.e. there are cases
>where we don't try to add the explicit sort before passing the path to
>generate_gather_paths().
>
>And I think I know why is that - while gather_grouping_paths() tries to
>add explicit sort below the gather merge, there are other places that
>call generate_gather_paths() that don't do that. In this case it's
>probably apply_scanjoin_target_to_paths() which simply builds
>
> parallel (seq|index) scan + gather merge
>
>and that's it. The problem is likely the same - the code does not know
>which pathkeys are "interesting" at that point. We probably need to
>teach planner to do this.
>

I've looked into this again, and yes - that's the reason. I've added
generate_useful_gather_paths() which is a wrapper on top of
generate_gather_paths(). It essentially does what 0001 patch did directly
in generate_gather_paths() so it's more like create_grouping_paths().

And that does the trick - we now generate the cheaper paths, and I don't
see any crashes in regression tests etc.

I still suspect we already have code doing similar checks whether pathkeys
might be useful somewhere. I've looked into pathkeys.c and elsewhere but
no luck.

Attached is a slightly modified patch series:

1) 0001 considers incremental sort paths in various places (adds the new
generate_useful_gather_paths and modifies places calling create_sort_path)

2) 0002 and 0003 are fixes I mentioned before

3) 0004 adds a new GUC force_incremental_sort that (when set to 'on')
tries to nudge the optimizer into using incremental sort by essentially
making it free (i.e. using startup/total costs of the subpath). I've found
this useful when trying to force incremental sorts into plans where it may
not be the best strategy.

I won't have time to hack on this over the next ~2 weeks, but I'll try to
respond to questions when possible.

>
>FWIW tweaking all the create_sort_path() places to also consider adding
>incremental sort is a bit tedious and invasive, and it almost doubles
>the amount of repetitive code. It's OK for experiment like this, but we
>should try handling this in a nicer way (move to a separate function
>that does both, or something like that).
>

This definitely needs more work. We need to refactor it in some way, e.g.
have a function that would consider both explicit sort (on the cheapest
path) and incremental sort (on all paths), and call it from all those
places. Haven't tried it, though.

There's also a couple more places where we do create_sort_path() and don't
consider incremental sort yet - window functions, distinct etc.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-fix-pathkey-processing-in-generate_gather_p-20190709.patch text/plain 23.6 KB
0002-fix-costing-in-cost_incremental_sort-20190709.patch text/plain 1.1 KB
0003-fix-explain-in-parallel-mode-20190709.patch text/plain 947 bytes
0004-add-force_incremental_sort-GUC-20190709.patch text/plain 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-07-09 12:16:49 Re: [HACKERS] Cached plans and statement generalization
Previous Message Joe Conway 2019-07-09 12:01:35 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)