Quick Links

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From:	Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>
To:	Sergey Sargsyan <sergey(dot)sargsyan(dot)2001(at)gmail(dot)com>
Cc:	Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Subject:	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date:	2025-11-09 18:02:00
Message-ID:	CADzfLwVaV15R2rUNZmKqLKweiN3SnUBg=6_qGE_ERb7cdQUD8g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello!

This is a rebased version.

Also I decided to keep only part 3 for now, because we need some
common solution to keep the horizon advance for both INDEX and REPACK
operations [0].
More complex solution description and benchmark results are available at [3].

PART 3
STIR-based validation phase CIC

That part is about a way to replace the second phase of CIC in a more
effective way (and with the ability to allow horizon advance as an
additional bonus).

The role of the second phase is to find tuples which are not present
in the index built by the first scan, because:
- some of them were too new for the snapshot used during the first phase
- even if we were to use SnapshotSelf to accept all alive tuples –
some of them may be inserted in pages already visited by the scan

The main idea is:
- before starting the first scan lets prepare a special auxiliary
super-lightweight index (it is not even an index or access method,
just pretends to be) with the same columns, expressions and predicates
- that access method (Short Term Index Replacement – STIR) just
appends TID of new coming tuples, without WAL, minimum locking,
simplest append-only structure, without actual indexed data
- it remembers all new TIDs inserted to the table during the first phase
- once our main (target) index receives updates itself we may safely
clear "ready" flag on STIR
- if our first phase scan missed something – it is guaranteed to be
present in that STIR index
- so, instead of requirement to compare the whole table to the index,
we need only to compare to TIDs stored in the STIR
- as a bonus we may reset snapshots during the comparison without risk
of any issues caused by HOT pruning (the issue [2] caused revert of
[1]).

That approach provides a significant performance boost in terms of
time required to build the index. STIR itself theoretically causes
some performance impact, but I was not able to detect it. Also, some
optimizations are applied to it (see below). Details of benchmarks are
presented below as well.

Commits are:
- Add STIR access method and flags related to auxiliary indexes

This one adds STIR code and some flags to distinguish real and
auxiliary indexes.

- Add Datum storage support to tuplestore

Add ability to store Datum in tuplestore. It is used by the following
commits to leverage performance boost from prefetching of the pages
during the validation phase.

- Use auxiliary indexes for concurrent index operations

The main part is here. It contains all the logic for creation of
auxiliary index, managing its lifecycle, new validation phase and so
on (including progress reporting, some documentation updates, ability
to have an unlogged index for logged tables, etc). At the same time it
still relies on a single referenced snapshot during the validation
phase.

- Track and drop auxiliary indexes in DROP/REINDEX

That commit adds different techniques to avoid any additional
administration requirements to deal with auxiliary indexes in case of
error during the index build (junk auxiliary indexes). It adds
dependency tracking, special logic for handling REINDEX calls and
other small things to make the administrator's life a little bit easier.

- Optimize auxiliary index handling

Since the STIR index does not contain any actual data we may skip
preparation of that during tuple insert. Commit implements such
optimization.

- Refresh snapshot periodically during index validation

Adds logic to the new validation phase to reset the snapshot every so
often. Currently it does it every 4096 pages visited.
Probably a caveat here is the requirement to call
InvalidateCatalogSnapshot to make sure xmin propagates.
But AFAIK the same may happen between transaction boundaries in CIC
anyway - and ShareUpdateExclusiveLock on table is enough.

[0]: https://www.postgresql.org/message-id/flat/202510301734.pj4uds3mqxx4%40alvherre.pgsql#fd20662912580a89b860790f9729aaff
[1]: https://github.com/postgres/postgres/commit/d9d076222f5b94a85e0e318339cfc44b8f26022d
[2]: https://www.postgresql.org/message-id/flat/20220524190133.j6ee7zh4f5edt5je%40alap3.anarazel.de#1781408f40034c414ad6738140c118ef
[3]: https://www.postgresql.org/message-id/CADzfLwVOcZ9mg8gOG+KXWurt=MHRcqNv3XSECYoXyM3ENrxyfQ@mail.gmail.com

Best regards,
Mikhail.

Attachment	Content-Type	Size
v26-0005-Use-auxiliary-indexes-for-concurrent-index-opera.patch	text/plain	94.4 KB
v26-0001-This-is-https-commitfest.postgresql.org-50-5160-.patch	text/plain	23.2 KB
v26-0003-Add-STIR-access-method-and-flags-related-to-auxi.patch	text/plain	37.3 KB
v26-0002-Add-stress-tests-for-concurrent-index-builds.patch	text/plain	9.1 KB
v26-0004-Add-Datum-storage-support-to-tuplestore.patch	text/plain	19.0 KB
v26-0006-Track-and-drop-auxiliary-indexes-in-DROP-REINDEX.patch	text/plain	30.5 KB
v26-0007-Optimize-auxiliary-index-handling.patch	text/plain	2.4 KB
v26-0008-Refresh-snapshot-periodically-during-index-valid.patch	text/plain	20.8 KB

In response to

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements at 2025-10-28 18:37:00 from Mihail Nikalayeu

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2025-11-09 18:14:57	Re: should we have a fast-path planning for OLTP starjoins?
Previous Message	Tom Lane	2025-11-09 16:03:31	Re: pg_dump not dumping default_text_search_config WAI?