Re: New IndexAM API controlling index vacuum strategies

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-08 18:57:01
Message-ID: CA+TgmoakKFXwUv1Cx2mspUuPQHzYF74BfJ8koF5YdgVLCvhpwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2021 at 10:17 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> * No need to change MaxHeapTuplesPerPage for now, since that only
> really makes sense in cases that heavily involve bottom-up deletion,
> where we care about the *concentration* of LP_DEAD line pointers in
> heap pages (and not just the absolute number in the entire table),
> which is qualitative, not quantitative (somewhat like bottom-up
> deletion).
>
> The change to MaxHeapTuplesPerPage that Masahiko has proposed does
> make sense -- there are good reasons to increase it. Of course there
> are also good reasons to not do so. I'm concerned that we won't have
> time to think through all the possible consequences.

Yes, I agree that it's good to postpone this to a future release, and
that thinking through the consequences is not so easy. One possible
consequence that I'm concerned about is sequential scan performance.
For an index scan, you just jump to the line pointer you want and then
go get the tuple, but a sequential scan has to loop over all the line
pointers on the page, and skipping a lot of dead ones can't be
completely free. A small increase in MaxHeapTuplesPerPage probably
wouldn't matter, but the proposed increase of almost 10x (291 -> 2042)
is a bit scary. It's also a little hard to believe that letting almost
50% of the total space on the page get chewed up by the line pointer
array is going to be optimal. If that happens to every page while the
amount of data stays the same, the table must almost double in size.
That's got to be bad. The whole thing would be more appealing if there
were some way to exert exponentially increasing back-pressure on the
length of the line pointer array - that is, make it so that the longer
the array is already, the less willing we are to extend it further.
But I don't really see how to do that.

Also, at the risk of going on and on, line pointer array bloat is very
hard to eliminate once it happens. We never even try to shrink the
line pointer array, and if the last TID in the array is still in use,
it wouldn't be possible anyway, assuming the table has at least one
non-BRIN index. Index page splits are likewise irreversible, but
creating a new index and dropping the old one is still less awful than
having to rewrite the table.

Another thing to consider is that MaxHeapTuplesPerPage is used to size
some stack-allocated arrays, especially the stack-allocated
PruneState. I thought for a while about this and I can't really see
why it would be a big problem, even with a large increase in
MaxHeapTuplesPerPage, so I'm just mentioning this in case it makes
somebody else think of something I've missed.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-08 19:25:22 Re: Why isn't pg_stat_get_subscription() marked as proretset?
Previous Message Ibrar Ahmed 2021-03-08 18:55:32 Re: SQL/JSON: functions