Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2019-07-21 11:34:22
Message-ID: 20190721113422.hjubqs45xxpnldn7@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 20, 2019 at 07:37:08PM -0400, James Coleman wrote:
>> ..
>>
>> Yes, you're right - an extra sort node would be necessary. But I don't
>> think that changes the idea behind that example. The question is whether
>> the extra sorts below the gather would be cheaper than doing a large sort
>> on top of it - but I don't see why wouldn't that be the case, and if we
>> only need a couple of rows from the beginning ...
>
>Ah, I see. Right now we get:
>
> Limit
> -> Incremental Sort
> Sort Key: a, (sum(expensive_function(c)))
> Presorted Key: a
> -> Finalize GroupAggregate
> Group Key: a, b
> -> Gather Merge
> Workers Planned: 2
> -> Partial GroupAggregate
> Group Key: a, b
> -> Parallel Index Scan using t_a_b on t
>
>even with the parallel additions to create_ordered_paths() -- that
>addition doesn't actually add any new parallel paths because
>input_rel->partial_pathlist != NIL is false (I'm not sure why yet), so
>if we want (if I understand correctly) something more like:
>

Well, in this particular case it's fairly simple, I think. We only call
the create_ordered_paths() from the grouping planner, on the upper
relation that represents the result of the GROUP BY. But that means it
has to see only the finalized result (after Finalize GroupAggregate). So
it can't see partial aggregation or any other partial path. So in this
case it seems guaranteed (partial_pathlist == NIL).

So maybe we should not be looking at GROUP BY queries, which probably
can't hit this particular code path at all - we need a different type of
upper relation. For example UNION ALL should hit this code, I think.

So maybe try

select * from t union all select * from t order by a, b limit 10;

and that will hit this condition with partial_pathlist != NIL.

I don't know if such queries can benefit from incremental sort, though.
There are other upper relations too:

typedef enum UpperRelationKind
{
UPPERREL_SETOP, /* result of UNION/INTERSECT/EXCEPT, if any */
UPPERREL_PARTIAL_GROUP_AGG, /* result of partial grouping/aggregation, if
* any */
UPPERREL_GROUP_AGG, /* result of grouping/aggregation, if any */
UPPERREL_WINDOW, /* result of window functions, if any */
UPPERREL_DISTINCT, /* result of "SELECT DISTINCT", if any */
UPPERREL_ORDERED, /* result of ORDER BY, if any */
UPPERREL_FINAL /* result of any remaining top-level actions */
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;

>I'm still struggling to understand though how another incremental sort
>below the gather-merge would actually be able to help us. For one I'm
>not sure it would be less expensive,

That's a good question. I think in general we agree that if we can get
the gather merge to sort the data the way the operation above the gather
merge (which is the first operation that can't operate in parallel
mode), that's probably beneficial.

So this pattern seems reasonable:

-> something
-> non-parallel operation
-> gather merge
-> incremental sort
-> something

And it's likely faster, especially when the parts above this can
leverage the lower startup cost. Say, when there's an explicit LIMIT.

I think the question is where exactly do we add the incremental sort.
It's quite possible some of the places we modified are redundant, at
least for some queries. Not sure.

>but more importantly I'm not sure how we could do that and maintain
>correctness. Wouldn't a per-worker sum not actually be useful in
>sorting since it has no predictable impact on the ordering of the total
>sum?

Yes, you're right - that wouldn't be correct. The reason why I've been
thinking about such examples because distributed databases do make such
things in some cases, and we might do that too with partitioning.

Consider a simple example

create table t (a int, b int, c int) partition by hash (a);
create table t0 partition of t for values with (modulus 4, remainder 0);
create table t1 partition of t for values with (modulus 4, remainder 1);
create table t2 partition of t for values with (modulus 4, remainder 2);
create table t3 partition of t for values with (modulus 4, remainder 3);
insert into t select mod(i,1000), i, i from generate_series(1,1000000) s(i);
analyze t;
select a, count(b) from t group by a order by a, count(b) limit 10;

In this case we might do a plan similar to what I proposed, assuming
each worker gets to execute on a different partition (because then we
know each worker will see distinct groups, thanks to the partitioning).

But AFAIK we don't do such optimizations yet, so it's probably just a
distraction.

>Describing that got me thinking of similar cases where ordering of the
>partial aggregate would (in theory) be a correct partial sort for the
>total ordering, and it seems like min() and max() would be. So I ran
>the same experiment with that instead of sum(), but, you guessed it,
>input_rel->partial_pathlist != NIL is false again, so we don't add any
>parallel paths in create_ordered_paths().
>

Right. That's because with aggregation, grouping planner only sees the
total result, not the partial paths. We need different upper rel to
exercise that code path.

>I'm leaning towards thinking considering parallel incremental sorts in
>create_ordered_paths() won't add value. But I also feel like this
>whole project has me jumping into the deep end of the pool again (this
>time the planner), so I'm still picking up a lot of pieces for how all
>this fits together, and as such I don't have a great intuitive grasp
>yet of how this particular part of the planning process maps to the
>kind of queries and plans we consider. All that to say: if you have
>further thoughts, I'm happy to look into it, but right now I'm not
>seeing anything.
>

Understood. FWIW I'm not particularly familiar with this code (or which
places are supposed to work together), so I definitely agree it may be
overwhelming. Especially when it's only a part of a larger patch.

I wonder if we're approaching this wrong. Maybe we should not reverse
engineer queries for the various places, but just start with a set of
queries that we want to optimize, and then identify which places in the
planner need to be modified.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2019-07-21 12:44:31 Re: Performance issue in foreign-key-aware join estimation
Previous Message David Rowley 2019-07-21 09:37:28 Re: Speed up transaction completion faster after many relations are accessed in a transaction