Re: New IndexAM API controlling index vacuum strategies

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-01-20 06:34:53
Message-ID: CAD21AoAL11KHtnY2X3r2A8ektKnZtGRYFXVQkMc5AA-aJpS9UA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 20, 2021 at 9:45 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Tue, Jan 19, 2021 at 2:57 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > * Maybe it would be better if you just changed the definition such
> > that "MAXALIGN(SizeofHeapTupleHeader)" became "MAXIMUM_ALIGNOF", with
> > no other changes? (Some variant of this suggestion might be better,
> > not sure.)
> >
> > For some reason that feels a bit safer: we still have an "imaginary
> > tuple header", but it's just 1 MAXALIGN() quantum now. This is still
> > much less than the current 3 MAXALIGN() quantums (i.e. what
> > MaxHeapTuplesPerPage treats as the tuple header size). Do you think
> > that this alternative approach will be noticeably less effective
> > within vacuumlazy.c?
>
> BTW, I think that increasing MaxHeapTuplesPerPage will make it
> necessary to consider tidbitmap.c. Comments at the top of that file
> say that it is assumed that MaxHeapTuplesPerPage is about 256. So
> there is a risk of introducing performance regressions affecting
> bitmap scans here.
>
> Apparently some other DB systems make the equivalent of
> MaxHeapTuplesPerPage dynamically configurable at the level of heap
> tables. It usually doesn't matter, but it can matter with on-disk
> bitmap indexes, where the bitmap must be encoded from raw TIDs (this
> must happen before the bitmap is compressed -- there must be a simple
> mapping from every possible TID to some bit in a bitmap first). The
> item offset component of each heap TID is not usually very large, so
> there is a trade-off between keeping the representation of bitmaps
> efficient and not unduly restricting the number of distinct heap
> tuples on each heap page. I think that there might be a similar
> consideration here, in tidbitmap.c (even though it's not concerned
> about on-disk bitmaps).

That's a good point. With the patch, MaxHeapTuplesPerPage increased to
2042 with 8k page, and to 8186 with 32k page whereas it's currently
291 with 8k page and 1169 with 32k page. So it is likely to be a
problem as you pointed out. If we change
"MAXALIGN(SizeofHeapTupleHeader)" to "MAXIMUM_ALIGNOF", it's 680 with
8k patch and 2728 with 32k page, which seems much better.

The purpose of increasing MaxHeapTuplesPerPage in the patch is to have
a heap page accumulate more LP_DEAD line pointers. As I explained
before, considering MaxHeapTuplesPerPage, we cannot calculate how many
LP_DEAD line pointers can be accumulated into the space taken by
fillfactor simply by ((the space taken by fillfactor) / (size of line
pointer)). We need to consider both how many line pointers are
available for LP_DEAD and how much space is available for LP_DEAD.

For example, suppose the tuple size is 50 bytes and fillfactor is 80,
each page has 1633 bytes (=(8192-24)*0.2) free space taken by
fillfactor, where 408 line pointers can fit. However, if we store 250
LP_DEAD line pointers into that space, the number of tuples that can
be stored on the page is only 41, although we have 6534 bytes
(=(8192-24)*0.8) where 121 tuples (+line pointers) can fit because
MaxHeapTuplesPerPage is 291. In this case, where the tuple size is 50
and fillfactor is 80, we can accumulate up to about 170 LP_DEAD line
pointers while storing 121 tuples. Increasing MaxHeapTuplesPerPage
raises this 291 limit and enables us to forget the limit when
calculating the maximum number of LP_DEAD line pointers that can be
accumulated on a single page.

An alternative approach would be to calculate it using the average
tuple's size. I think if we know the tuple size, the maximum number of
LP_DEAD line pointers can be accumulated into the single page is the
minimum of the following two formula:

(1) MaxHeapTuplesPerPage - (((BLCKSZ - SizeOfPageHeaderData) *
(fillfactor/100)) / (sizeof(ItemIdData) + tuple_size))); //how many
line pointers are available for LP_DEAD?

(2) ((BLCKSZ - SizeOfPageHeaderData) * ((1 - fillfactor)/100)) /
sizeof(ItemIdData); //how much space is available for LP_DEAD?

But I'd prefer to increase MaxHeapTuplesPerPage but not to affect the
bitmap much rather than introducing a complex theory.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2021-01-20 06:41:57 Re: list of extended statistics on psql
Previous Message Tom Lane 2021-01-20 06:30:07 Re: [PATCH 1/1] Initial mach based shared memory support.