Re: disfavoring unparameterized nested loops

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: disfavoring unparameterized nested loops
Date: 2021-06-21 17:35:42
Message-ID: CA+TgmoZwsGehz-NHpBViVpA8+4RXvQ23+hFdx1KzMKN7LjF6bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 21, 2021 at 1:11 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Mon, Jun 21, 2021 at 9:52 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > I'm not so sure that it is. The point isn't the risk, even if it could
> > > be calculated. The point is the downsides of being wrong are huge and
> > > pretty much unbounded, whereas the benefits of being right are tiny
> > > and bounded. It almost doesn't matter what the underlying
> > > probabilities are.
> >
> > You're throwing around a lot of pejorative adjectives there without
> > having bothered to quantify any of them. This sounds less like a sound
> > argument to me than like a witch trial.
>
> I'm not sure what you mean by pejorative. Isn't what I said about
> downsides and upsides pretty accurate?

Yeah, I don't see why Peter's characterization deserves to be labelled
as pejorative here. A hash join or merge join or parameterized nested
loop can turn out to be slower than some other algorithm, but all of
those approaches have some component that tends to make the asymptotic
cost less than the product of the sizes of the inputs. I don't think
that's true in absolutely every case; for example, if a merge join has
every row duplicated on both sides, it will have to scan every inner
tuple once per outer tuple, just like a nested loop, and the other
algorithms also are going to degrade toward O(NM) performance in the
face of many duplicates. Also, a hash join can be pretty close to that
if it needs a shazillion batches. But in normal cases, any algorithm
other than an unparameterized nested loop tends to read each input
tuple on each side ONCE, so the cost is more like the sum of the input
sizes than the product. And there's nothing pejorative in saying that
N + M can be less than N * M by an unbounded amount. That's just the
facts.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-21 17:38:28 Re: disfavoring unparameterized nested loops
Previous Message Tom Lane 2021-06-21 17:19:19 Re: Out-of-bounds (src/backend/utils/misc/queryjumble.c)