Re: disfavoring unparameterized nested loops

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Benjamin Coutu <ben(dot)coutu(at)zeyos(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: disfavoring unparameterized nested loops
Date: 2022-09-30 03:59:49
Message-ID: CAApHDvq2bCCyA51VTgiszk_+16mJ-xhSNtkLQjJ1zijA9Ym8ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 30 Sept 2022 at 13:06, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I like the idea of just avoiding unparameterized nested loop joins
> altogether when an "equivalent" hash join plan is available because
> it's akin to an execution-time mitigation, despite the fact that it
> happens during planning. While it doesn't actually change anything in
> the executor, it is built on the observation that we have virtually
> everything to gain and nothing to lose during execution, no matter
> what happens.

I'm not sure if it's a good idea to assume that performing
non-parameterised Nested Loops when we shouldn't is the only shape of
plan that causes us problems.

We also have the case where we assume early start-up plans are
favourable. For example:

SELECT * FROM t WHERE a = 1 ORDER BY b LIMIT 10;

where we have two indexes, one on t(a) and another on t(b).

Should we use the t(b) index and filter out the rows that don't match
a = 1 and hope we get 10 a=1 rows soon in the t(b) index? or do we use
t(a) and then perform a sort? Best case for using the t(b) index is
that we find 10 a=1 rows in the first 10 rows of the index scan, the
worst case is that there are no rows with a=1.

Having something coded into the cost model is a more generic way of
addressing this issue. Providing we design the cost model correctly,
we'd be able to address future issues we discover using which ever
cost model infrastructure that we design for this.

I understand that what you propose would be a fast way to fix this
issue. However, if we went and changed the join path creation code to
not add non-parameterised nested loop paths when other paths exist,
then how could we ever dare to put that code back again when we come
up with a better solution?

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-09-30 04:04:43 Re: Suppressing useless wakeups in walreceiver
Previous Message Nathan Bossart 2022-09-30 03:51:41 Re: Suppressing useless wakeups in walreceiver