|From:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|To:||James Coleman <jtc331(at)gmail(dot)com>|
|Cc:||Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>|
|Subject:||Re: [PATCH] Incremental sort (was: PoC: Partial sort)|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On Tue, Jul 09, 2019 at 03:37:03AM +0200, Tomas Vondra wrote:
>Notice that cost of the second plan is almost double the first one. That
>means 0004 does not even generate the first plan, i.e. there are cases
>where we don't try to add the explicit sort before passing the path to
>And I think I know why is that - while gather_grouping_paths() tries to
>add explicit sort below the gather merge, there are other places that
>call generate_gather_paths() that don't do that. In this case it's
>probably apply_scanjoin_target_to_paths() which simply builds
> parallel (seq|index) scan + gather merge
>and that's it. The problem is likely the same - the code does not know
>which pathkeys are "interesting" at that point. We probably need to
>teach planner to do this.
I've looked into this again, and yes - that's the reason. I've added
generate_useful_gather_paths() which is a wrapper on top of
generate_gather_paths(). It essentially does what 0001 patch did directly
in generate_gather_paths() so it's more like create_grouping_paths().
And that does the trick - we now generate the cheaper paths, and I don't
see any crashes in regression tests etc.
I still suspect we already have code doing similar checks whether pathkeys
might be useful somewhere. I've looked into pathkeys.c and elsewhere but
Attached is a slightly modified patch series:
1) 0001 considers incremental sort paths in various places (adds the new
generate_useful_gather_paths and modifies places calling create_sort_path)
2) 0002 and 0003 are fixes I mentioned before
3) 0004 adds a new GUC force_incremental_sort that (when set to 'on')
tries to nudge the optimizer into using incremental sort by essentially
making it free (i.e. using startup/total costs of the subpath). I've found
this useful when trying to force incremental sorts into plans where it may
not be the best strategy.
I won't have time to hack on this over the next ~2 weeks, but I'll try to
respond to questions when possible.
>FWIW tweaking all the create_sort_path() places to also consider adding
>incremental sort is a bit tedious and invasive, and it almost doubles
>the amount of repetitive code. It's OK for experiment like this, but we
>should try handling this in a nicer way (move to a separate function
>that does both, or something like that).
This definitely needs more work. We need to refactor it in some way, e.g.
have a function that would consider both explicit sort (on the cheapest
path) and incremental sort (on all paths), and call it from all those
places. Haven't tried it, though.
There's also a couple more places where we do create_sort_path() and don't
consider incremental sort yet - window functions, distinct etc.
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Thomas Munro||2019-07-09 12:16:49||Re: [HACKERS] Cached plans and statement generalization|
|Previous Message||Joe Conway||2019-07-09 12:01:35||Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)|