Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Arne Roland <A(dot)Roland(at)index(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: generate fractional cheapest paths in generate_orderedappend_path
Date: 2022-01-12 22:43:04
Message-ID: ed041b8e-1440-a09a-3632-85b3985df598@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pushed, after clarifying the comments a bit.

I also looked into what would it take to consider incremental paths, and
in principle it doesn't seem all that complicated. The attached PoC
patch extends get_cheapest_fractional_path_for_pathkeys() to optionally
build incremental sort on paths if needed. There are two GUCs that make
experimenting simpler:

* enable_fractional_paths - disables fractional paths entirely, i.e. we
get behavior without the part I already pushed

* enable_fractional_incremental_paths - disables the incremental sort
part added by the attached patch

With this, I get this plan (see the test in partitionwise_join.sql)

test=# EXPLAIN (COSTS OFF)
test-# SELECT * FROM fract_t x LEFT JOIN fract_t y USING (id1, id2)
ORDER BY id1 ASC, id2 ASC LIMIT 10;
QUERY PLAN

------------------------------------------------------------------------------
Limit
-> Merge Left Join
Merge Cond: ((x.id1 = y.id1) AND (x.id2 = y.id2))
-> Append
-> Index Only Scan using fract_t0_id1_id2_idx on
fract_t0 x_1
-> Incremental Sort
Sort Key: x_2.id1, x_2.id2
Presorted Key: x_2.id1
-> Index Scan using fract_t1_pkey on fract_t1 x_2
-> Materialize
-> Append
-> Incremental Sort
Sort Key: y_1.id1, y_1.id2
Presorted Key: y_1.id1
-> Index Scan using fract_t0_pkey on
fract_t0 y_1
Index Cond: (id1 = id1)
Filter: (id2 = id2)
-> Incremental Sort
Sort Key: y_2.id1, y_2.id2
Presorted Key: y_2.id1
-> Index Scan using fract_t1_pkey on
fract_t1 y_2
Index Cond: (id1 = id1)
Filter: (id2 = id2)
(23 rows)

instead of

QUERY PLAN

------------------------------------------------------------------------------
Limit
-> Incremental Sort
Sort Key: x.id1, x.id2
Presorted Key: x.id1
-> Merge Left Join
Merge Cond: (x.id1 = y.id1)
Join Filter: (x.id2 = y.id2)
-> Append
-> Index Scan using fract_t0_pkey on fract_t0 x_1
-> Index Scan using fract_t1_pkey on fract_t1 x_2
-> Materialize
-> Append
-> Index Scan using fract_t0_pkey on
fract_t0 y_1
-> Index Scan using fract_t1_pkey on
fract_t1 y_2
(14 rows)

i.e. the incremental sorts moved below the merge join (and the cost is
lower, but that's not shown here).

So that seems reasonable, but there's a couple issues too:

1) Planning works (hence we can run explain), but execution fails
because of segfault in CheckVarSlotCompatibility - it gets NULL slot for
some reason. I haven't looked into the details, but I'd guess we need to
pass a different rel to create_incrementalsort_path, not childrel.

2) enable_partitionwisejoin=on seems to cause some confusion, because it
results in picking a different plan with higher cost. But that's clearly
not because of this patch.

3) I'm still a bit skeptical about the cost of this implementation - it
builds the incrementalsort path, just to throw most of the paths away.
It'd be much better to just calculate the cost, which should be much
cheaper, and only do the heavy work for the one "best" path.

4) One thing I did not realize before is what pathkeys we even consider
here. Imagine you have two tables:

CREATE TABLE t1 (a int, b int, primary key (a));
CREATE TABLE t2 (a int, b int, primary key (a));

and query

SELECT * FROM t1 JOIN t2 USING (a,b);

It seems reasonable to also consider pathkeys (a,b) because that's make
e.g. mergejoin much cheaper, right? But no, we'll not do that - we only
consider pathkeys that at least one child relation has, so in this case
it's just (a). Which means we'll never consider incremental sort for
this particular example.

It's a bit strange, because it's enough to create index on (a,b) for
just one of the relations, and it'll suddenly consider building
incremental sort on both sides.

I don't plan to pursue this further at this point, so I pushed the first
part because that's a fairly simple improvement over what we have now.
But it seems like a fairly promising area for improvements.

Also, the non-intuitive effects of enabling partition-wise joins (i.e.
picking plans with higher estimated costs) is something worth exploring,
I guess. But that's a separate issue.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Consider-incremental-sort-in-generate_orderedappend_.patch text/x-patch 9.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergey Shinderuk 2022-01-12 23:01:24 Re: Improve error handling of HMAC computations and SCRAM
Previous Message John Naylor 2022-01-12 22:33:17 Re: A qsort template