disfavoring unparameterized nested loops

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: disfavoring unparameterized nested loops
Date: 2021-06-15 17:09:38
Message-ID: CA+TgmoYtWXNpj6D92XxUfjT_YFmi2dWq1XXM9EY-CRcr2qmqbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From time to time, someone tells me that they've configured
enable_nestloop=false on postgresql.conf, which is a pretty bad idea
since there are a significant number of cases where such plans are the
only reasonable way of executing some query. However, it's no great
secret that PostgreSQL's optimizer sometimes produces nested loops
that are very, very, very slow, generally because it has grossly
underestimated the cardinality of the inner side of the nested loop.
The frustration which users experience as a result of such events is
understandable.

I read https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf
today and found out that the authors of that paper did something a bit
more nuanced which, in their experiments, was very successful. It
sounds like what they basically did is disabled unparameterized nested
loops. They argue that such nested loops figure to gain very little as
compared with a hash join, but that they might turn out to lose a lot
if the cardinality estimation is inaccurate, and they present
experimental results to back up those claims. One observation that the
paper makes along the way is that every system they tested is more
likely to underestimate the cardinality of joins than to overestimate
it, and that this tendency becomes more pronounced as the size of the
join planning problem increases. On reflection, I believe this matches
my experience, and it makes sense that it should be so, since it
occasionally happens that the join selectivity estimate is essentially
zero, and a bigger join problem is more likely to have at least one
such case. On the other hand, the join selectivity estimate can never
be +infinity. Hence, it's more important in general for a database
system to be resilient against underestimates than to be resilient
against overestimates. Being less willing to choose unparameterized
nested loops is one way to move in that direction.

How to do that is not very clear. One very simple thing we could do
would be to introduce enable_nestloop=parameterized or
enable_parameterized_nestloop=false. That is a pretty blunt instrument
but the authors of the paper seem to have done that with positive
results, so maybe it's actually not a dumb idea. Another approach
would be to introduce a large fuzz factor for such nested loops e.g.
keep them only if the cost estimate is better than the comparable hash
join plan by at least a factor of N (or if no such plan is being
generated for some reason). I'm not very confident that this would
actually do what we want, though. In the problematic cases, a nested
loop is likely to look extremely cheap, so just imagining that the
cost might be higher is not very protective. Perhaps a better approach
would be something like: if the estimated number of inner rows is less
than 100, then re-estimate the cost of this approach and of the best
available hash join on the assumption that there are 100 inner rows.
If the hash join still wins, keep it; if it loses under that
assumption, throw it out. I think it's likely that this approach would
eliminate a large number of highly risky nested loop plans, probably
even with s/100/10/g, without causing many other problems (except
perhaps increased planner CPU consumption ... but maybe that's not too
bad).

Just to be clear, I do understand that there are cases where no Hash
Join is possible, but anything we do here could be made to apply only
when a hash join is in fact possible. We could also think about making
the point of comparison the best other plans of any sort rather than a
hash join specifically, which feels a little more principled but might
actually be worse. When a Nested Loop is a stupid idea, it's stupid
precisely because the inner side is big and we could've avoided
recomputing it over and over by using a Hash Join instead, not because
some Merge Join based plan turns out to be better. I mean, it is
possible that some Merge Join plan does turn out to be better, but
that's not rage-inducing in the same way. Nobody looks at a
complicated join plan that happened to use a Nested Loop and says
"obviously, this is inferior to a merge join," or if they do, they're
probably full of hot air. But people look at complicated join plans
that happen to use a Nested Loop and say "obviously, this is inferior
to a hash join" *all the time* and assuming the inner path is
unparameterized, they are very often correct.

Thoughts? I'd be particularly curious to hear about any cases anyone
knows about where an unparameterized nested loop and a hash join with
batches=1 are both possible and where the unparameterized nested loop
is *much* cheaper than the hash join.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-06-15 17:10:36 Re: snapshot too old issues, first around wraparound and then more.
Previous Message Tom Lane 2021-06-15 17:03:25 Re: change logging defaults