Quick Links

Re: Use of additional index columns in rows filtering

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Maxim Ivanov <hi(at)yamlcoder(dot)me>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, markus(dot)winand(at)winand(dot)at
Subject:	Re: Use of additional index columns in rows filtering
Date:	2023-08-03 05:26:50
Message-ID:	CAH2-WzmUEzjAvVfjqRYAE0j7YCgVh5ogKJ4GKopXjSYm55SFHQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 2, 2023 at 6:32 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I don't dispute the fact that this can only happen when the planner
> believes (with good reason) that the expected cost will be lower. But
> I maintain that there is a novel risk to be concerned about, which is
> meaningfully distinct from the general risk of regressions that comes
> from making just about any change to the planner. The important
> principle here is that we should have a strong bias in the direction
> of making quals into true "Access Predicates" whenever practical.
>
> Yeah, technically the patch doesn't directly disrupt how existing
> index paths get generated. But it does have the potential to disrupt
> it indirectly, by providing an alternative very-similar index path
> that's likely to outcompete the existing one in these cases. I think
> that we should have just one index path that does everything well
> instead.

You can see this for yourself, quite easily. Start by running the
relevant query from the regression tests, which is:

SELECT * FROM tenk1 WHERE thousand = 42 AND (tenthous = 1 OR tenthous
= 3 OR tenthous = 42);

EXPLAIN (ANALYZE, BUFFERS) confirms that the patch makes the query
slightly faster, as expected. I see 7 buffer hits for the bitmap index
scan plan on master, versus only 4 buffer hits for the patch's index
scan. Obviously, this is because we go from multiple index scans
(multiple bitmap index scans) to only one.

But, if I run this insert statement and try the same thing again,
things look very different:

insert into tenk1 (thousand, tenthous) select 42, i from
generate_series(43, 1000) i;

(Bear in mind that we've inserted rows that don't actually need to be
selected by the query in question.)

Now the master branch's plan works in just the same way as before --
it has exactly the same overhead (7 buffer hits). Whereas the patch
still gets the same risky plan -- which now blows up. The plan now
loses by far more than it could ever really hope to win by: 336 buffer
hits. (It could be a lot higher than this, even, but you get the
picture.)

Sure, it's difficult to imagine a very general model that captures
this sort of risk, in the general case. But you don't need a degree in
actuarial science to understand that it's inherently a bad idea to
juggle chainsaws -- no matter what your general risk tolerance happens
to be. Some things you just don't do.

--
Peter Geoghegan

In response to

Re: Use of additional index columns in rows filtering at 2023-08-03 01:32:40 from Peter Geoghegan

Responses

Re: Use of additional index columns in rows filtering at 2023-08-03 11:20:55 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Yuya Watari	2023-08-03 06:08:32	Re: [PoC] Reducing planning time when tables have many partitions
Previous Message	Amul Sul	2023-08-03 05:23:10	Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression