Re: disfavoring unparameterized nested loops

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: disfavoring unparameterized nested loops
Date: 2021-06-21 15:55:48
Message-ID: 1648206.1624290948@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)bowt(dot)ie> writes:
> The heuristic that has the optimizer flat out avoids an
> unparameterized nested loop join is justified by the belief that
> that's fundamentally reckless. Even though we all agree on that much,
> I don't know when it stops being reckless and starts being "too risky
> for me, but not fundamentally reckless". I think that that's worth
> living with, but it isn't very satisfying.

There are certainly cases where the optimizer can prove (in principle;
it doesn't do so today) that a plan node will produce at most one row.
They're hardly uncommon either: an equality comparison on a unique
key, or a subquery with a simple aggregate function, come to mind.

In such cases, not only is this choice not reckless, but it's provably
superior to a hash join. So in the end this gets back to the planning
risk factor that we keep circling around but nobody quite wants to
tackle.

I'd be a lot happier if this proposal were couched around some sort
of estimate of the risk of the outer side producing more than the
expected number of rows. The arguments so far seem like fairly lame
rationalizations for not putting forth the effort to do that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-06-21 16:19:13 Re: Add version macro to libpq-fe.h
Previous Message Tom Lane 2021-06-21 15:43:37 Re: Use simplehash.h instead of dynahash in SMgr