Re: disfavoring unparameterized nested loops

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Benjamin Coutu <ben(dot)coutu(at)zeyos(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: disfavoring unparameterized nested loops
Date: 2022-10-02 18:39:49
Message-ID: CAH2-WznOY=SkO4L=DZAZbEC3Jo4fbG_DCY7QOSt7mF=N+eLSjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 2, 2022 at 3:43 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Sep 30, 2022 at 2:24 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > It's not just that the risks are ludicrously high, of course. The
> > potential benefits must *also* be very low. It's both factors,
> > together.
>
> Hmm, maybe. But it also wouldn't surprise me very much if someone can
> come up with a test case where a nested loop with a single row (or
> maybe no rows) on one side or the other and it's significantly faster
> than any alternative plan.

That's certainly possible, but wouldn't the difference all come from
fixed startup costs? If we're talking about a single row, with a
minimal test case, then the potential downside of this more
conservative strategy might indeed amount to something like a 2x or 3x
slowdown, if we look at it in isolation. But why measure it that way?
I think that absolute differences like milliseconds of execution time
are much more relevant.

Real production databases have many queries with very diverse
characteristics -- there is a lot going on at any given moment. The
proportion of queries that will be affected either way by avoiding
unparamaterized nested loop joins is probably going to be very small.
Nobody is going to notice if only a small subset or all queries are
maybe 1 ms or 2 ms slower. As long as it's well within the margin of
noise in 100% of all cases, it really shouldn't matter.

AFAICT the choice is one of "bounded, low upside versus unbounded,
high downside".

> I believe, though, that even if such cases
> exist, they are probably relatively few in number compared to the
> cases where parameterized nested loops hurt, and the savings are
> probably small compared to the multiple-orders-of-magnitude slowdowns
> that you can get when a nested loop goes bad. But they might still be
> relatively large -- 2x, 3x? -- in absolute terms.

I suspect it won't even matter if disallowing unparamaterized nested
loop joins loses on average.

I am reminded of this:
https://en.wikipedia.org/wiki/St._Petersburg_paradox#Expected_utility_theory

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-10-02 19:10:19 Re: Question: test "aggregates" failed in 32-bit machine
Previous Message Justin Pryzby 2022-10-02 18:38:37 Re: [RFC] building postgres with meson - v13