New IndexAM API controlling index vacuum strategies

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: New IndexAM API controlling index vacuum strategies
Date: 2020-12-22 22:54:04
Message-ID: CAD21AoD0SkE11fMw4jD4RENAwBMcw1wasVnwpJVw3tVqPOQgAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I've started this separate thread from [1] for discussing the general
API design of index vacuum.

Summary:

* Call ambulkdelete and amvacuumcleanup even when INDEX_CLEANUP is
false, and leave it to the index AM whether or not skip them.
* Add a new index AM API amvacuumstrategy(), asking the index AM the
strategy before calling to ambulkdelete.
* Whether or not remove garbage tuples from heap depends on multiple
factors including INDEX_CLEANUP option and the answers of
amvacuumstrategy() for each index AM.

The first point is to fix the inappropriate behavior discussed on the thread[1].

The second and third points are to introduce a general framework for
future extensibility. User-visible behavior is not changed by this
change.

The new index AM API, amvacuumstrategy(), which is called before
bulkdelete() for each index and asks the index bulk-deletion strategy.
On this API, lazy vacuum asks, "Hey index X, I collected garbage heap
tuples during heap scanning, how urgent is vacuuming for you?", and
the index answers either "it's urgent" when it wants to do
bulk-deletion or "it's not urgent, I can skip it". The point of this
proposal is to isolate heap vacuum and index vacuum for each index so
that we can employ different strategies for each index. Lazy vacuum
can decide whether or not to do heap clean based on the answers from
the indexes.

By default, if all indexes answer 'yes' (meaning it will do
bulkdelete()), lazy vacuum can do heap clean. On the other hand, if
even one index answers 'no' (meaning it will not do bulkdelete()),
lazy vacuum doesn't the heap clean. Lazy vacuum would also be able to
require indexes to do bulkdelete() for some reason such as specyfing
INDEX_CLEANUP option by the user. It’s something like saying "Hey
index X, you answered not to do bulkdelete() but since heap clean is
necessary for me please don't skip bulkdelete()".

Currently, if INDEX_CLEANUP option is not set (i.g.
VACOPT_TERNARY_DEFAULT in the code), it's treated as true and will do
heap clean. But with this patch we use the default as a neutral state
('smart' mode). This neutral state could be "on" and "off" depending
on several factors including the answers of amvacuumstrategy(), the
table status, and user's request. In this context, specifying
INDEX_CLEANUP would mean making the neutral state "on" or "off" by
user's request. The table status that could influence the decision
could concretely be, for instance:

* Removing LP_DEAD accumulation due to skipping bulkdelete() for a long time.
* Making pages all-visible for index-only scan.

Also there are potential enhancements using this API:

* If bottom-up index deletion feature[2] is introduced, individual
indexes could be a different situation in terms of dead tuple
accumulation; some indexes on the table can delete its garbage index
tuples without bulkdelete(). A problem will appear that doing
bulkdelete() for such indexes would not be efficient. This problem is
solved by this proposal because we can do bulkdelete() for a subset of
indexes on the table.

* If retail index deletion feature[3] is introduced, we can make the
return value of bulkvacuumstrategy() a ternary value: "do_bulkdelete",
"do_indexscandelete", and "no".

* We probably can introduce a threshold of the number of dead tuples
to control whether or not to do index tuple bulk-deletion (like
bulkdelete() version of vacuum_cleanup_index_scale_factor). In the
case where the amount of dead tuples is slightly larger than
maitenance_work_mem the second time calling to bulkdelete will be
called with a small number of dead tuples, which is inefficient. This
problem is also solved by this proposal by allowing a subset of
indexes to skip bulkdelete() if the number of dead tuple doesn't
exceed the threshold.

I’ve attached the PoC patch for the above idea. By default, since lazy
vacuum choose the vacuum bulkdelete strategy based on answers of
amvacuumstrategy() so it can be either true or false ( although it’s
always true in the currene patch). But for amvacuumcleanup() there is
no the neutral state, lazy vacuum treats the default as true.

Comment and feedback are very welcome.

Regards,

[1] https://www.postgresql.org/message-id/20200415233848.saqp72pcjv2y6ryi%40alap3.anarazel.de
[2] https://www.postgresql.org/message-id/CAH2-Wzm%2BmaE3apHB8NOtmM%3Dp-DO65j2V5GzAWCOEEuy3JZgb2g%40mail.gmail.com
[3] https://www.postgresql.org/message-id/425db134-8bba-005c-b59d-56e50de3b41e%40postgrespro.ru

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

Attachment Content-Type Size
poc_vacuumstrategy.patch application/octet-stream 37.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2020-12-22 23:22:19 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Tomas Vondra 2020-12-22 22:45:42 Re: Parallel bitmap index scan