Re: Planner making bad choice in alternative subplan decision

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Planner making bad choice in alternative subplan decision
Date: 2020-09-29 20:39:34
Message-ID: CAApHDvqXMsdC8xqZ65moSrYcRaHTGGND6=7dxSYGGTGsa7E19g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 30 Sep 2020 at 04:42, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> > On Tue, 29 Sep 2020 at 12:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> The idea I'd had was to adjust make_subplan and cost_subplan to estimate
> >> EXIST cases on the basis of either 50% retrieval (same as ANY/ALL) or
> >> maybe full retrieval if you want to be pessimistic. I've not had time
> >> to try it out though.
>
> > Here's a patch which does that. There are no regression test failures.
>
> (1) IMO, make_subplan should be adjusted in the same way; or at least,
> if it isn't, the comments therein need to be changed to explain why
> it's okay for it to be out of sync with cost_subplan.

All the code seems to be doing is deciding on the tuple_fraction to
pass to the planner. Perhaps it's worth putting a note there to
mention why cost_subplan() does it differently, but that might be
better explained in cost_subplan()

> (2) Does this fix the plan choice problem you started with?

Yes

> > I'm trying to anticipate areas that this could cause a regression. I
> > think generally a hashed subplan should win in about all cases where
> > we lookup all of the items in the table. The places where it could
> > cause regression are, where we have to load way more into the hash
> > table than we're going to lookup. Assuming roughly the same costs for
> > the subplan for hashing and the non-hashed subplan, then the crossover
> > point will be about 2 lookups (2 x 50%). I do wonder if 50% is a bit
> > too big a number. We did ask the planner for a fast startup plan, so
> > there is perhaps some reasoning to make the cost multiplier somewhere
> > between 50% and 1 / nrows. I'm just not sure what that should be.
> > There's very little to go on cost-wise, or even heuristically. So
> > consider the patch still a discussion level patch.
>
> One way to get some data is to see what the threshold multiplier is
> to change the plan choice in an EXISTS that is currently planned
> wrongly.

It'll choose the hashed plan if we divide by just half the rows.

I've input the costs into the attached spreadsheet. Changing cell D2
is what I'm proposing to change. The patch effectively changes this to
=F2*0.5 which makes the cost of the non-hashed plan 850,000 (vs 195
for the hashed plan). Changing it to =F2/C2*2 (i.e divide by half the
rows) would cause the planner to choose the hashed plan.

David

Attachment Content-Type Size
subplan_costing.ods application/vnd.oasis.opendocument.spreadsheet 10.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Sitnikov 2020-09-29 20:48:24 Re: BLOB / CLOB support in PostgreSQL
Previous Message Heikki Linnakangas 2020-09-29 20:31:07 Re: Yet another fast GiST build