Re: disfavoring unparameterized nested loops

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: disfavoring unparameterized nested loops
Date: 2021-06-21 11:27:10
Message-ID: CAFBsxsFAhMq-nKqY8e3xeTyasyNz0vUC44J84HaCJFj-7PzO6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 15, 2021 at 8:00 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> In my experience, the most common reason that the planner chooses
> non-parameterized nested loops wrongly is when there's row
> underestimation that says there's just going to be 1 row returned by
> some set of joins. The problem often comes when some subsequent join
> is planned and the planner sees the given join rel only produces one
> row. The cheapest join method we have to join 1 row is Nested Loop.
> So the planner just sticks the 1-row join rel on the outer side
> thinking the executor will only need to scan the inner side of the
> join once. When the outer row count blows up, then we end up scanning
> that inner side many more times. The problem is compounded when you
> nest it a few joins deep
>
> Most of the time when I see that happen it's down to either the
> selectivity of some correlated base-quals being multiplied down to a
> number low enough that we clamp the estimate to be 1 row. The other
> case is similar, but with join quals.

If an estimate is lower than 1, that should be a red flag that Something Is
Wrong. This is kind of a crazy idea, but what if we threw it back the other
way by computing 1 / est , and clamping that result to 2 <= res < 10 (or
100 or something)? The theory is, the more impossibly low it is, the more
wrong it is. I'm attracted to the idea of dealing with it as an estimation
problem and not needing to know about join types. Might have unintended
consequences, though.

Long term, it would be great to calculate something about the distribution
of cardinality estimates, so we can model risk in the estimates.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-06-21 11:32:24 Re: Remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values
Previous Message Daniel Gustafsson 2021-06-21 11:23:56 Re: SSL/TLS instead of SSL in docs