Re: Improve planner cost estimations for alternative subplans

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve planner cost estimations for alternative subplans
Date: 2020-06-22 01:39:37
Message-ID: 1997465.1592789977@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Nope. The entire reason why we have that kluge is that we don't know
> until much later how many times we expect to execute the subplan.
> AlternativeSubPlan allows the decision which subplan form to use to be
> postponed till runtime; but when we're doing things like estimating the
> cost and selectivity of a where-clause, we don't know that.

> Maybe there's some way to recast things to avoid that problem,
> but I have little clue what it'd look like.

Actually ... maybe it's not that bad. Clearly there would be a
circularity issue for selectivity estimation, but all the alternatives
should have the same selectivity. Cost estimation is a different story:
by the time we need to do cost estimation for a subexpression, we do in
many cases have an idea how often the subexpression will be executed.

I experimented with adding a number-of-evaluations parameter to
cost_qual_eval, and found that the majority of call sites do have
something realistic they can pass. The attached very-much-WIP
patch shows my results so far. There's a lot of loose ends:

* Any call site using COST_QUAL_EVAL_DUMMY_NUM_EVALS is a potential spot
for future improvement. The only one that seems like it might be
fundamentally unsolvable is cost_subplan's usage; but that would only
matter if a subplan's testexpr contains an AlternativeSubPlan, which is
probably a negligible case in practice. The other ones seem to just
require more refactoring than I cared to do on a Sunday afternoon.

* I did not do anything for postgres_fdw.c beyond making it compile.
We can surely do better there, but it might require some rethinking
of the way that plan costs get cached.

* The merge and hash join costsize.c functions estimate costs of qpquals
(i.e. quals to be applied at the join that are not being used as merge
or hash conditions) by computing cost_qual_eval of the whole
joinrestrictlist and then subtracting off the cost of the merge or hash
quals. This is kind of broken if we want to use different num_eval
estimates for the qpquals and the merge/hash quals, which I think we do.
This probably just needs some refactoring to fix. We also need to save
the relevant rowcounts in the join Path nodes so that createplan.c can
do the right thing.

* I think it might be possible to improve the situation in
get_agg_clause_costs() if we're willing to postpone collection
of the actual aggregate costs till later. This'd require two
passes over the aggregate expressions, but that seems like it
might not be terribly expensive. (I'd be inclined to also look
at the idea of merging duplicate agg calls at plan time not
run time, if we refactor that.)

* I had to increase the number of table rows in one updatable_views.sql
test to keep the plans the same. Without that, the planner realized
that a seqscan would be cheaper than an indexscan. The result wasn't
wrong exactly, but it failed to prove that leakproof quals could be
used as indexquals, so I think we need to keep the plan choice the same.

Anyway, this is kind of invasive, but I think it shouldn't really
add noticeable costs as long as we save relevant rowcounts rather
than recomputing them in createplan.c. Is it worth doing? I dunno.
AlternativeSubPlan is pretty much a backwater, I think --- if it
were interesting performance-wise to a lot of people, more would
have been done with it by now.

regards, tom lane

Attachment Content-Type Size
better-alternativesubplan-cost-ests-1.patch text/x-diff 50.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-06-22 01:53:25 Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
Previous Message David Rowley 2020-06-22 01:29:31 Re: Get rid of runtime handling of AlternativeSubPlan?