make add_paths_to_append_rel aware of startup cost

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Zhang Mingli <zmlpostgres(at)gmail(dot)com>
Subject: make add_paths_to_append_rel aware of startup cost
Date: 2023-09-06 12:39:56
Message-ID: CAKU4AWrXSkUV=Pt-gRxQT7EbfUeNssprGyNsB=5mJibFZ6S3ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi:

This thread is a refactor of thread [1] for easier communication.

Currently add_paths_to_append_rel overlooked the startup cost for creating
append path, so it may have lost some optimization chances. After a
glance,
the following 4 identifiers can be impacted.

Identifier 1:

SELECT .. FROM v1
UNION ALL
SELECT .. FROM v2
LIMIT 3;

Identifier 2:

SELECT * FROM p .. LIMIT 3; p is a partitioned table.

Identifier 3:
SELECT * FROM p JOIN q using (partkey) LIMIT 3;

If we did the partition-wise-join, then we lost the chances for a better
plan.

Identifier 4: -- EXISTS implies LIMIT 1;
SELECT * FROM foo
WHERE EXISTS
(SELECT 1 FROM append_rel_v_not_pullup_able WHERE xxx);

However, after I completed my patch and wanted to build some real
queries to prove my idea, I just find it is hard to build the case for
Identifier 2/3/4. But the Improvement for Identifier 1 is easy and
my real user case in work is Identifier 1 as well.

So a patch is attached for this case, it will use fractional costs
rather than total costs if needed. The following items needs more
attention during development.

- We shouldn't do the optimization if there are still more tables to join,
the reason is similar to has_multiple_baserels(root) in
set_subquery_pathlist. But the trouble here is we may inherit multiple
levels to build an appendrel, so I have to keep the 'top_relids' all the
time and compare it with PlannerInfo.all_baserels. If they are the same,
then it is the case we want to optimize.

- add_partial_path doesn't consider the startup costs by design, I didn't
rethink too much about this, but the idea of "choose a path which
let each worker produces the top-N tuples as fast as possible" looks
reasonable, and even though add_partial_path doesn't consider startup
cost, it is still possible that childrel keeps more than 1 partial paths
due to any reasons except startup_cost, for example PathKey. then we
still have chances to choose the cheapest fractional path among
them. The same strategy also applies to mixed partial and non-partial
append paths.

- Due to the complexity of add_paths_to_append_rel, 3 arguments have
to be added to get_cheapest_fractional_path...

Path *
get_cheapest_fractional_path(RelOptInfo *rel, double tuple_fraction,
bool allow_parameterized, bool look_partial,
bool must_parallel_safe)

Cases can be improved.

Here is the simplest test case, but it will not be hard to provide more
cases for Identifier 1.

(select * from tenk1 order by hundred)
UNION ALL
(select * from tenk1 order by hundred)
limit 3;

master: 8.096ms.
patched: 0.204ms.

The below user case should be more reasonable for real business.

with a as (select * from t1 join t2..),
b as (select * from t1 join t3 ..)
select * from a union all select * from b
limit 3;

The patch would also have impacts on identifier 2/3/4, even though I can't
make a demo sql which can get benefits from this patch, I also added
some test cases for code coverage purposes.

Any feedback is welcome!

[1]
https://www.postgresql.org/message-id/flat/CAKU4AWqEnzhUTxopVhENC3vs6NnYV32+e6GSBtp1rAv0ZNX=mQ(at)mail(dot)gmail(dot)com

--
Best Regards
Andy Fan

Attachment Content-Type Size
v1-0001-make-add_paths_to_append_rel-aware-of-startup-cos.patch application/octet-stream 28.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Damir Belyalov 2023-09-06 12:49:36 Output affected rows in EXPLAIN
Previous Message jacktby jacktby 2023-09-06 11:46:42 Re: How to add a new pg oid?