Re: New IndexAM API controlling index vacuum strategies

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-13 05:34:21
Message-ID: CAD21AoCWqAhDN0RN6dpAa2CASj-WabZOG=FJfGhWFi8Mow3uHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 9, 2021 at 2:22 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Tue, Mar 2, 2021 at 8:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Tue, Mar 2, 2021 at 2:34 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > > lazy_vacuum_table_and_indexes() should probably not skip index
> > > vacuuming when we're close to exceeding the space allocated for the
> > > LVDeadTuples array. Maybe we should not skip when
> > > vacrelstats->dead_tuples->num_tuples is greater than 50% of
> > > dead_tuples->max_tuples? Of course, this would only need to be
> > > considered when lazy_vacuum_table_and_indexes() is only called once
> > > for the entire VACUUM operation (otherwise we have far too little
> > > maintenance_work_mem/dead_tuples->max_tuples anyway).
> >
> > Doesn't it actually mean we consider how many dead *tuples* we
> > collected during a vacuum? I’m not sure how important the fact we’re
> > close to exceeding the maintenance_work_mem space. Suppose
> > maintenance_work_mem is 64MB, we will not skip both index vacuum and
> > heap vacuum if the number of dead tuples exceeds 5592404 (we can
> > collect 11184809 tuples with 64MB memory). But those tuples could be
> > concentrated in a small number of blocks, for example in a very large
> > table case. It seems to contradict the current strategy that we want
> > to skip vacuum if relatively few blocks are modified. No?
>
> There are competing considerations. I think that we need to be
> sensitive to accumulating "debt" here. The cost of index vacuuming
> grows in a non-linear fashion as the index grows (or as
> maintenance_work_mem is lowered). This is the kind of thing that we
> should try to avoid, I think. I suspect that cases where we can skip
> index vacuuming and heap vacuuming are likely to involve very few dead
> tuples in most cases anyway.
>
> We should not be sensitive to the absolute number of dead tuples when
> it doesn't matter (say because they're concentrated in relatively few
> heap pages). But when we overrun the maintenance_work_mem space, then
> the situation changes; the number of dead tuples clearly matters just
> because we run out of space for the TID array. The heap page level
> skew is not really important once that happens.
>
> That said, maybe there is a better algorithm. 50% was a pretty arbitrary number.

I agreed that when we're close to overrunning the
maintnenance_work_mem space, the situation changes. If we skip it in
even that case, the next vacuum will be likely to use up
maintenance_work_mem, leading to a second index scan. Which is
bad.

If this threshold is aimed to avoid a second index scan due to
overrunning the maintenance_work_mem, using a ratio of
maintenance_work_mem would be a good idea. On the other hand, if it's
to avoid accumulating debt affecting the cost of index vacuuming,
using a ratio of the total heap tuples seems better.

The situation where we need to deal with here is a very large table
that has a lot of dead tuples but those fit in fewer heap pages (less
than 1% of all heap blocks). In this case, it's likely that the number
of dead tuples also is relatively small compared to the total heap
tuples, as you mentioned. If dead tuples fitted in fewer pages but
accounted for most of all heap tuples in the heap, it would be a more
serious situation, there would definitely already be other problems.
So considering those conditions, I agreed to use a ratio of
maintenance_work_mem as a threshold. Maybe we can increase the
constant to 70, 80, or so.

>
> Have you thought more about how the index vacuuming skipping can be
> configured by users? Maybe a new storage param, that works like the
> current SKIP_VACUUM_PAGES_RATIO constant?

Since it’s unclear to me yet that adding a new storage parameter or
GUC parameter for this feature would be useful even for future
improvements in this area, I haven't thought yet about having users
control skipping index vacuuming. I’m okay with a constant value for
the threshold for now.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2021-03-13 06:01:13 Re: proposal: schema variables
Previous Message Tom Lane 2021-03-13 05:30:35 Re: pg_amcheck contrib application