Re: disfavoring unparameterized nested loops

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: disfavoring unparameterized nested loops
Date: 2021-06-21 20:42:12
Message-ID: CAH2-WzkGuG5TzBjamt4Y2t0acbi_gPzLwUTdFBGOvf_enUGzqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 21, 2021 at 10:14 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Hmm, maybe I need to see an example of the sort of plan shape that you
> have in mind. To me it feels like a comparison on a unique key ought
> to use a *parameterized* nested loop. And it's also not clear to me
> why a nested loop is actually better in a case like this. If the
> nested loop iterates more than once because there are more rows on the
> outer side, then you don't want to have something on the inner side
> that might be expensive, and either an aggregate or an unparameterized
> search for a unique value are potentially quite expensive. Now if you
> put a materialize node on top of the inner side, then you don't have
> to worry about that problem, but how much are you saving at that point
> vs. just doing a hash join?

I suspected that that was true, but even that doesn't seem like the
really important thing. While it may be true that the simple heuristic
can't be quite as simple as we'd hoped at first, ISTM that this is
ultimately not much of a problem. The basic fact remains: some more or
less simple heuristic makes perfect sense, and should be adapted.

This conclusion is counterintuitive because it's addressing a very
complicated problem with a very simple solution. However, if we lived
in a world where things that sound too good to be true always turned
out to be false, we'd also live in a world where optimizers were
completely impractical and useless. Optimizers have that quality
already, whether or not we officially acknowledge it.

> I don't understand how to generate a risk assessment or what we ought
> to do with it if we had one. I don't even understand what units we
> would use. We measure costs using abstract cost units, but those
> abstract cost units are intended to be a proxy for runtime. If it's
> not the case that a plan that runs for longer has a higher cost, then
> something's wrong with the costing model or the settings. In the case
> of risk, the whole thing seems totally subjective. We're talking about
> the risk that our estimate is bogus, but how do we estimate the risk
> that we don't know how to estimate?

Clearly we need a risk estimate for our risk estimate!

> The other thing is - the risk of a particular path doesn't matter in
> an absolute sense, only a relative one. In the particular case I'm on
> about here, we *know* there's a less-risky alternative.

Exactly! This, a thousand times.

This reminds me of how people behave in the real world. In the real
world people deal with this without too much difficulty. Everything is
situational and based on immediate trade-offs, with plenty of
uncertainty at every step. For example, if you think that there is
even a tiny chance of a piece of fruit being poisonous, you don't eat
the piece of fruit -- better to wait until lunchtime. This is one of
the *easiest* decisions I can think of, despite the uncertainty.
(Except perhaps if you happen to be in danger of dying of starvation,
in which case it might be a very different story. And so on.)

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-06-21 20:52:37 Re: disfavoring unparameterized nested loops
Previous Message Tom Lane 2021-06-21 19:03:12 Re: disfavoring unparameterized nested loops