Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Arne Roland <A(dot)Roland(at)index(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Date: 2022-01-14 00:39:30
Message-ID: 46c3469d-4183-9816-f109-2b572b7fa851@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/13/22 21:12, Arne Roland wrote:
>  Hi!
>
>> From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
>> Subject: Re: PATCH: generate fractional cheapest paths in
> generate_orderedappend_path
>>  
>> test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2)
>> ORDER BY id1 ASC, id2 ASC LIMIT 10;
>>                                    QUERY PLAN
>>
>>
> ------------------------------------------------------------------------------
>>   Limit
>>     ->  Merge Left Join
>>           Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
>>           ->  Append
>>                 ->  Index Only Scan using fract_t0_id1_id2_idx on
>>                                           fract_t0 x_1
>>                 ->  Incremental Sort
>>                       Sort Key: x_2.id1, x_2.id2
>>                       Presorted Key: x_2.id1
>>                       ->  Index Scan using fract_t1_pkey on fract_t1 x_2
>>           ->  Materialize
>>                 ->  Append
>>                       ->  Incremental Sort
>>                             Sort Key: y_1.id1, y_1.id2
>>                             Presorted Key: y_1.id1
>>                             ->  Index Scan using fract_t0_pkey on
>>                                                  fract_t0 y_1
>>                                   Index Cond: (id1 = id1)
>>                                   Filter: (id2 = id2)
>>                       ->  Incremental Sort
>>                             Sort Key: y_2.id1, y_2.id2
>>                             Presorted Key: y_2.id1
>>                             ->  Index Scan using fract_t1_pkey on
>>                                                  fract_t1 y_2
>>                                   Index Cond: (id1 = id1)
>>                                   Filter: (id2 = id2)
>> (23 rows)
>> [...]
>> So that seems reasonable
>
> Maybe I'm just slow, but that doesn't seem reasonable to me. It doesn't
> look like a valid plan to me. Sure all the nodes are arranged like I'd
> like them to be. But what are the id1/id2 bound we have in the index and
> filter conditions?
>

I'm not claiming the plan is 100% correct, and you may have a point
about the index condition / filter in the materialize branch.

But the overall plan shape (with incremental sort nodes on both sides)
seems reasonable to me. The materialize node is expected (incremental
sort does not support rescans cheaply).

Maybe that's not any cheaper than just doing merge join on the first
column, and filter on the second. But we should be able to decide that
based on cost, I think.

>> [...]but there's a couple issues too:
>>
>> 1) Planning works (hence we can run explain), but execution fails
>> because of segfault in CheckVarSlotCompatibility - it gets NULL slot for
>> some reason. I haven't looked into the details, but I'd guess we need to
>> pass a different rel to create_incrementalsort_path, not childrel.
>
> In case my above concern is valid, maybe the executor is just as
> confused as I am. Such conditions should generate VarSlots, no? If so,
> where should they come from?
>

Yeah, something like that.

> Sadly I don't have time to debug that in depth today.
>
>> 2) enable_partitionwisejoin=on seems to cause some confusion, because it
>> results in picking a different plan with higher cost. But that's clearly
>> not because of this patch.
>
> Short version: I'd neither conceptually expect costs to be lower in any
> case, nor would that be desirable, because our estimates aren't perfect.
>
> Long version: What do you mean by confusion. The plan I get with the
> patch doesn't seem to confusing to me. Generally something like that is
> to be expected. enable_partitionwisejoin changes the way this planing
> works by rewriting the entire query effectively rule based. So we end up
> with a completely different query. I'd in particular expect slightly
> different startup costs.
> So if we activate this we consider completely different plans, I
> struggle to come up with a meaningful example where there is any overlap
> at all. Thus it doesn't surprise me conceptually.
> From practical experience I'd say: If they are about the same plan, the
> costs estimates work somewhat okish.
> If we change the way we compose the nodes together, we sometimes end up
> with radical different costs for doing the same. While I suspect there
> are a lot of corner cases causing this, I've seen quite a few where this
> was due to the fact, that our planer often has insignificant statistics
> to know something and takes a guess. This has gotten way better of
> recent years, but it's in particular for non-trivial joins still a
> problem in practice.
>

By confusion I meant that if you plan the query with partitionwise join
enabled, you get a plan with cost X, and if you disable it you get a
different plan with cost Y, where (Y < X). Which is somewhat unexpected,
because that seems to simply reduce the set of plans we explore.

But as you say, enable_partitionwise_join kinda "switches" between two
planning modes. Not sure why we don't try building both paths and decide
based on costs.

>> 3) I'm still a bit skeptical about the cost of this implementation - it
>> builds the incrementalsort path, just to throw most of the paths away.
>> It'd be much better to just calculate the cost, which should be much
>> cheaper, and only do the heavy work for the one "best" path.
>
> Maybe we should profile this to get a rough estimate, how much time we
> spend building these incremental paths. From a code perspective it's non
> trivial to me where the time is lost.
>

TBH I haven't really done any profiling, but I wouldn't be surprised if
it got somewhat expensive for large number of child relations,
especially if there are a couple indexes on each. We do something
similar for nestloop (see initial_cost_nestloop).

>> 4) One thing I did not realize before is what pathkeys we even consider
>> here. Imagine you have two tables:
>>
>>     CREATE TABLE t1 (a int, b int, primary key (a));
>>     CREATE TABLE t2 (a int, b int, primary key (a));
>>
>> and query
>>
>>     SELECT * FROM t1 JOIN t2 USING (a,b);
>>
>> It seems reasonable to also consider pathkeys (a,b) because that's make
>> e.g. mergejoin much cheaper, right? But no, we'll not do that - we only
>> consider pathkeys that at least one child relation has, so in this case
>> it's just (a). Which means we'll never consider incremental sort for
>> this particular example.
>>
>> It's a bit strange, because it's enough to create index on (a,b) for
>> just one of the relations, and it'll suddenly consider building
>> incremental sort on both sides.
>
> I don't find that surprising, because the single index *might* make the
> incremental sort cheaper for the join *without* considering any external
> sort order.
> So we would be switching up the incremental sort and the mergejoin, in
> case we need to sort anyways. That would mean considering also the sort
> order, that might be relevant on the outside. Sounds like an interesting
> idea for a later patch.
>

I'm not sure it depends on the incremental sort. I suspect in some cases
it might be faster to fully sort the merge join inputs, even if none of
the input paths has suitable pathkeys.

For example, if you do

... FROM t1 JOIN t2 USING (a,b) ...

but there are only indexes on (a), maybe sorting on (a,b) would win e.g.
if there's a lot of duplicate values in (a)?

I was thinking about this variation on example from the committed patch:

CREATE TABLE fract_t (id1 BIGINT, id2 BIGINT)
PARTITION BY RANGE (id1);

CREATE TABLE fract_t0 PARTITION OF fract_t
FOR VALUES FROM ('0') TO ('10');

CREATE TABLE fract_t1 PARTITION OF fract_t
FOR VALUES FROM ('10') TO ('20');

CREATE INDEX ON fract_t(id1);

INSERT INTO fract_t (id1, id2)
SELECT i/100000, i FROM generate_series(0, 1999999) s(i);

ANALYZE fract_t;

-- not interested in nestloop/hashjoin paths for now
set enable_hashjoin = off;
set enable_nestloop = off;
set max_parallel_workers_per_gather = 0;

EXPLAIN (COSTS OFF)
SELECT * FROM fract_t x JOIN fract_t y USING (id1, id2) order by id1;

which is now planned like this:

QUERY PLAN
-----------------------------------------------------
Merge Join
Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
-> Sort
Sort Key: x.id1, x.id2
-> Append
-> Seq Scan on fract_t0 x_1
-> Seq Scan on fract_t1 x_2
-> Materialize
-> Sort
Sort Key: y.id1, y.id2
-> Append
-> Seq Scan on fract_t0 y_1
-> Seq Scan on fract_t1 y_2
(13 rows)

But maybe a plan like this might be better:

QUERY PLAN
-----------------------------------------------------
Merge Join
Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
-> Incremental Sort
Sort Key: x.id1, x.id2
Presorted Key: x.id1
-> Append
-> Index Scan on fract_t0 x_1
-> Index Scan on fract_t1 x_2
-> Materialize
-> Incremental Sort
Sort Key: y.id1, y.id2
Presorted Key: y.id1
-> Append
-> Index Scan on fract_t0 y_1
-> Index Scan on fract_t1 y_2

or maybe even Incremental Sort + (Merge) Append on top. Which is what I
was trying to achieve with the experimental patch.

FWIW I did briefly look at the Merge Join + Incremental Sort plan too,
and it seems we don't consider incremental sorts there either. AFAICS
add_paths_to_joinrel justs call sort_inner_and_outer, which determines
interesting pathkeys in select_outer_pathkeys_for_merge, and I guess
that only considers pathkeys usable for full sort. In any case, we don't
actually add sort paths - we construct sort plans by calling make_sort
in create_mergejoin_plan. So there's not much chance for incremental
sort at all. That's kinda unfortunate, I guess. It's one of the
non-pathified places that we ignored while adding incremental sort.

>> I don't plan to pursue this further at this point, so I pushed the first
>> part because that's a fairly simple improvement over what we have now.
>> But it seems like a fairly promising area for improvements.
>
> I think 1) is pretty important, so we should sort that out sooner than
> later. Apart form that: :+1:
> Thank you!
>

I agree it's worth investigating and experimenting with. We may end up
realizing those plans are not worth it, but we won't know until we try.

It may require replacing some of the hard-coded logic in createplan.c
with constructing regular alternative paths. IIRC we even did some of
this work in the incremental sort patch at some point, but then ripped
that out to keep the patch smaller / simpler ... need to look at it again.

Are you interested / willing to do some of this work?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-01-14 00:48:33 minor bug in sort_inner_and_outer()
Previous Message Peter Smith 2022-01-14 00:18:07 Re: row filtering for logical replication