Re: disfavoring unparameterized nested loops

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: disfavoring unparameterized nested loops
Date: 2021-06-21 16:31:14
Message-ID: CAH2-Wzn8Os5KuuonDS+jkvpijgw4QtBdJ-igkHtZPFRjqJ337Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 21, 2021 at 8:55 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> There are certainly cases where the optimizer can prove (in principle;
> it doesn't do so today) that a plan node will produce at most one row.
> They're hardly uncommon either: an equality comparison on a unique
> key, or a subquery with a simple aggregate function, come to mind.

That sounds like it might be useful in general.

> In such cases, not only is this choice not reckless, but it's provably
> superior to a hash join. So in the end this gets back to the planning
> risk factor that we keep circling around but nobody quite wants to
> tackle.

Let's assume for the sake of argument that we really have to have that
additional infrastructure to move forward with the idea. (I'm not sure
if it's possible in principle to use infrastructure like that for some
of the cases that Robert has in mind, but for now I'll assume that it
is both possible and a practical necessity.)

Even when I make this working assumption I don't see what it changes
at a fundamental level. You've merely come up with a slightly more
specific definition of the class of plans that are "reckless". You've
only refined the original provisional definition of "reckless" to
exclude specific "clearly not reckless" cases (I think). But the
definition of "reckless" is no less squishy than what we started out
with.

> I'd be a lot happier if this proposal were couched around some sort
> of estimate of the risk of the outer side producing more than the
> expected number of rows. The arguments so far seem like fairly lame
> rationalizations for not putting forth the effort to do that.

I'm not so sure that it is. The point isn't the risk, even if it could
be calculated. The point is the downsides of being wrong are huge and
pretty much unbounded, whereas the benefits of being right are tiny
and bounded. It almost doesn't matter what the underlying
probabilities are.

To be clear I'm not arguing against modelling risk. I'm just not sure
that it's useful to think of this problem as truly a problem of risk.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-21 16:34:00 Re: Add version macro to libpq-fe.h
Previous Message Simon Riggs 2021-06-21 16:24:44 Re: Detecting File Damage & Inconsistencies