Re:disfavoring unparameterized nested loops

From: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David Rowley" <dgrowleyml(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re:disfavoring unparameterized nested loops
Date: 2021-06-22 11:10:28
Message-ID: 1ECE0028-C2F0-43BF-84F1-E3455CE10ED4@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> But making everything slower will be a hard sell, because vast majority of
> workloads already running on Postgres don't have this issue at all, so
> for them it's not worth the expense. Following the insurance analogy,
> selling tornado insurance in Europe is mostly pointless.
>

Agree. I've been surprised about NOT hearing complaints from PostgreSQL
customers about a particular "bad" plan choice that was common in other
rdbms products where large, complex queries were the norm. The situation
occurs late in a plan with many joins where a hash join can be used and
where either side is estimated to fit in memory. On one side is a base table
with cardinality that we have statistics for, while the other side has an
estimated cardinality that is the result of many estimates each of which
has error that can compound, and that in some cases amounts to a wild guess.
(e.g. what is the selectivity of SUM(x) < 12 ?). If the planner's point estimate
of cardinality is such that both sides could fit in memory, then a bad plan can
easily be made. As Peter said, [ most ] humans have no trouble dealing with
these kind of situations. They take the risk of being wrong into account.

So in our world, the useful numbers are 0, 1, measured N, and estimated N,
but we don't distinguish between measured N and estimated N.

But that doesn't mean that OLTP customers would be willing to accept
slightly suboptimal plans to mitigate a risk they don't experience.

> Insurance is also about personal preference / risk tolerance. Maybe I'm
> fine with accepting risk that my house burns down, or whatever ...

Right, and that's why the problem mentioned above is still out there
annoying customers who have complex plans. To them it looks like
an obviously bad plan choice.

Something that might help is to have the planner cost be a structure instead
of a number. Costs of plans that are deemed "risky" are accumulated
separately from plans that make no risky choices, and for a given set
of join items you keep the minimum cost plan of both types. It may happen that all
plans eventually make a risky choice, in which case you take the plan with the minimum
cost, but if there exists a plan with no risky choices, then the minimum cost
plan with no risky choices is chosen, with a GUC that enables a customer to ignore
risk when making this choice. This is not in the spirit of the hoped for simple heuristic,
and it would be heuristic in its classification of plans that are risky, but in the NLJ case
the cost of an unparameterized NLJ could be deemed risky if the cardinality of the inner
relation is not 0, 1, or measured N.



Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-06-22 11:13:31 Re: disfavoring unparameterized nested loops
Previous Message Drouvot, Bertrand 2021-06-22 10:38:34 Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys