Re: Use of additional index columns in rows filtering

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, markus(dot)winand(at)winand(dot)at
Subject: Re: Use of additional index columns in rows filtering
Date: 2023-08-03 05:26:50
Message-ID: CAH2-WzmUEzjAvVfjqRYAE0j7YCgVh5ogKJ4GKopXjSYm55SFHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I don't dispute the fact that this can only happen when the planner
> believes (with good reason) that the expected cost will be lower. But
> I maintain that there is a novel risk to be concerned about, which is
> meaningfully distinct from the general risk of regressions that comes
> from making just about any change to the planner. The important
> principle here is that we should have a strong bias in the direction
> of making quals into true "Access Predicates" whenever practical.
>
> Yeah, technically the patch doesn't directly disrupt how existing
> index paths get generated. But it does have the potential to disrupt
> it indirectly, by providing an alternative very-similar index path
> that's likely to outcompete the existing one in these cases. I think
> that we should have just one index path that does everything well
> instead.

You can see this for yourself, quite easily. Start by running the
relevant query from the regression tests, which is:

SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous
= 3 OR tenthous = 42);

EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query
slightly faster, as expected. I see 7 buffer hits for the bitmap index
scan plan on master, versus only 4 buffer hits for the patch's index
scan. Obviously, this is because we go from multiple index scans
(multiple bitmap index scans) to only one.

But, if I run this insert statement and try the same thing again,
things look very different:

insert into tenk1 (thousand, tenthous) select 42, i from
generate_series(43, 1000) i;

(Bear in mind that we've inserted rows that don't actually need to be
selected by the query in question.)

Now the master branch's plan works in just the same way as before --
it has exactly the same overhead (7 buffer hits). Whereas the patch
still gets the same risky plan -- which now blows up. The plan now
loses by far more than it could ever really hope to win by: 336 buffer
hits. (It could be a lot higher than this, even, but you get the
picture.)

Sure, it's difficult to imagine a very general model that captures
this sort of risk, in the general case. But you don't need a degree in
actuarial science to understand that it's inherently a bad idea to
juggle chainsaws -- no matter what your general risk tolerance happens
to be. Some things you just don't do.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuya Watari 2023-08-03 06:08:32 Re: [PoC] Reducing planning time when tables have many partitions
Previous Message Amul Sul 2023-08-03 05:23:10 Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression