Re: Improve planner cost estimations for alternative subplans

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Alexey Bashtanov <bashtanov(at)imap(dot)cc>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Improve planner cost estimations for alternative subplans
Date: 2020-08-17 14:12:39
Message-ID: CAKU4AWoMRzZKk1vPstKTjS7sYeN43j8WtsAZy2pv73vm_E_6dA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 22, 2020 at 9:39 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > Nope. The entire reason why we have that kluge is that we don't know
> > until much later how many times we expect to execute the subplan.
> > AlternativeSubPlan allows the decision which subplan form to use to be
> > postponed till runtime; but when we're doing things like estimating the
> > cost and selectivity of a where-clause, we don't know that.
>
> > Maybe there's some way to recast things to avoid that problem,
> > but I have little clue what it'd look like.
>
> Actually ... maybe it's not that bad. Clearly there would be a
> circularity issue for selectivity estimation, but all the alternatives
> should have the same selectivity. Cost estimation is a different story:
> by the time we need to do cost estimation for a subexpression, we do in
> many cases have an idea how often the subexpression will be executed.
>
>
I read your idea of "ripping out all executor support for AlternativeSubPlan
and instead having the planner replace an AlternativeSubPlan with
the desired specific SubPlan somewhere late in planning, possibly
setrefs.c."
in [1]. I was thinking that if we can do such a replacement sooner,
for example once we know the num_calls for the subplans, Unknown if it
is possible though. If we can, then we can handle the issue here as well.

The attached is a very PoC version, I'm not sure if it is the right
direction
to go. I'm sorry that I still need more time to understand your solution
below but I'm too excited about your original idea.

[1] https://www.postgresql.org/message-id/1992952.1592785225@sss.pgh.pa.us

> I experimented with adding a number-of-evaluations parameter to
> cost_qual_eval, and found that the majority of call sites do have
> something realistic they can pass. The attached very-much-WIP
> patch shows my results so far. There's a lot of loose ends:
>
> * Any call site using COST_QUAL_EVAL_DUMMY_NUM_EVALS is a potential spot
> for future improvement. The only one that seems like it might be
> fundamentally unsolvable is cost_subplan's usage; but that would only
> matter if a subplan's testexpr contains an AlternativeSubPlan, which is
> probably a negligible case in practice. The other ones seem to just
> require more refactoring than I cared to do on a Sunday afternoon.
>
> * I did not do anything for postgres_fdw.c beyond making it compile.
> We can surely do better there, but it might require some rethinking
> of the way that plan costs get cached.
>
> * The merge and hash join costsize.c functions estimate costs of qpquals
> (i.e. quals to be applied at the join that are not being used as merge
> or hash conditions) by computing cost_qual_eval of the whole
> joinrestrictlist and then subtracting off the cost of the merge or hash
> quals. This is kind of broken if we want to use different num_eval
> estimates for the qpquals and the merge/hash quals, which I think we do.
> This probably just needs some refactoring to fix. We also need to save
> the relevant rowcounts in the join Path nodes so that createplan.c can
> do the right thing.
>
> * I think it might be possible to improve the situation in
> get_agg_clause_costs() if we're willing to postpone collection
> of the actual aggregate costs till later. This'd require two
> passes over the aggregate expressions, but that seems like it
> might not be terribly expensive. (I'd be inclined to also look
> at the idea of merging duplicate agg calls at plan time not
> run time, if we refactor that.)
>
> * I had to increase the number of table rows in one updatable_views.sql
> test to keep the plans the same. Without that, the planner realized
> that a seqscan would be cheaper than an indexscan. The result wasn't
> wrong exactly, but it failed to prove that leakproof quals could be
> used as indexquals, so I think we need to keep the plan choice the same.
>
> Anyway, this is kind of invasive, but I think it shouldn't really
> add noticeable costs as long as we save relevant rowcounts rather
> than recomputing them in createplan.c. Is it worth doing? I dunno.
> AlternativeSubPlan is pretty much a backwater, I think --- if it
> were interesting performance-wise to a lot of people, more would
> have been done with it by now.
>
> regards, tom lane
>
>

--
Best Regards
Andy Fan

Attachment Content-Type Size
v1-0001-Convert-the-AlternativeSubplan-to-Subplan-as-soon.patch application/octet-stream 7.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-08-17 14:18:10 Re: Parallel bitmap index scan
Previous Message Bharath Rupireddy 2020-08-17 14:11:57 Re: Parallel bitmap index scan