Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY

From: Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Antonin Houska <ah(at)cybertec(dot)at>, Sergey Sargsyan <sergey(dot)sargsyan(dot)2001(at)gmail(dot)com>, Hannu Krosing <hannuk(at)google(dot)com>
Subject: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY
Date: 2026-01-15 17:19:58
Message-ID: CADzfLwWAiimwngq38x8Ji26AtYEmSqnFD7aa4kVGF4ObDswHMA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, everyone!

(added to CC involved persons based on history part and previous thread
activity)

This thread is an extraction of part of
https://commitfest.postgresql.org/patch/4971/ (STIR) into a separate
commitfest entry as was agreed in [0].

HISTORY

In 2021 Álvaro proposed [1] and committed [2] the feature: VACUUM ignores
snapshots involved in concurrent indexing operations. This was a great
feature in PG14.

But in 2022 a bug related to the tuples missing in indexes was detected,
and a little bit later explained by Andres [3]. As a result, the feature
was reverted [4].

This patch set brings back part of it (first phase of CIC) using
snapshot resetting technique.
For the second phase - another solution is available in entry mentioned
above.

STRUCTURE

It is based on Matthias' idea [5] - to just reset snapshots every so often
during a concurrent index build. It may work only during the first scan
(because we'll miss some tuples during validation scan with such an
approach).
Logic is simple – since the index built by the first scan already misses a
lot of tuples – we may not worry to miss a few more – the validation phase
is going to fix it anyway. Of course, it is not so simple in case of unique
indexes, but still possible.

Commits are:

- Add stress tests for concurrent index builds

This is a set of stress tests to ensure concurrent index operations are
worked correctly.
It easily detects the bug from 2022 (and other different things I met
during the development, some were in master and already fixed - [6] and
[7]).

- Reset snapshots periodically in non-unique non-parallel concurrent index
builds

Apply this technique to the simplest case – non-unique and non-parallel.
Snapshot is changed "between" pages.
One possible place here to worry about – to ensure xmin advanced we need to
call InvalidateCatalogSnapshot during each snapshot switch.
So, theoretically it may cause some issues, but the table is locked to
changes during the process. At least commit [2] (which ignored xmin of
CIC backend) did the same thing in essence, actually. Another more "clear"
option here - we may just extract a separate catalog snapshot horizon (one
more field near xmin specially only for catalog snapshot), it seems to be a
pretty straightforward change.

- Support snapshot resets in parallel concurrent index builds

Extend that technique to parallel builds. It is mostly about ensuring
workers have an initial snapshot restored from the leader before the leader
goes to reset it.

- Support snapshot resets in concurrent builds of unique indexes

The most tricky commit in the patch set – apply that to unique indexes.
Changing of snapshots may cause issues with validation of unique
constraints. Currently, validation is done during the sorting of tuples,
but that doesn't work with tuples read with different snapshots (some of
them are dead already).

To deal with it:
- in case we see two identical tuples during tuplesort – ignore if some of
them are dead according to SnapshotSelf, but fail if two are alive. It is
not a required part, it is just mechanics for fail-fast behavior and may be
removed or limited in spent resources.
- to provide the guarantee – during _bt_load compare the inserted index
value with previously inserted. If they are equal – make sure only a single
SnapshotSelf alive tuple exists in the whole equal "group" (it may include
more than two tuples in general).

Theoretically it may affect performance of _bt_load because of
_bt_keep_natts(_fast) call for each tuple, but I was unable to notice any
significant difference here.

Best regards,
Mikhail.

[0]:
https://www.postgresql.org/message-id/flat/CAEze2Wg6d2M8hop4uwdQXeH-YkOmEHyqp83%2BaE7vEzKdmu7w-A%40mail.gmail.com#b5dbefccc537167bc8c1efe0ad063491
[1]:
https://www.postgresql.org/message-id/flat/20210115142926.GA19300%40alvherre.pgsql#0988173cb0cf4b8eb710a6cdaa88fcac
[2]:
https://github.com/postgres/postgres/commit/d9d076222f5b94a85e0e318339cfc44b8f26022d
[3]:
https://www.postgresql.org/message-id/flat/20220524190133.j6ee7zh4f5edt5je%40alap3.anarazel.de#1781408f40034c414ad6738140c118ef
[4]:
https://github.com/postgres/postgres/commit/e28bb885196916b0a3d898ae4f2be0e38108d81b
[5]:
https://www.postgresql.org/message-id/flat/CAEze2WgW6pj48xJhG_YLUE1QS%2Bn9Yv0AZQwaWeb-r%2BX%3DHAxU_g%40mail.gmail.com#b3809c158de4481bb1b29894aaa63fae
[6]: https://commitfest.postgresql.org/patch/5438/
[7]: https://commitfest.postgresql.org/patch/5160/

Attachment Content-Type Size
v1-0001-Add-stress-tests-for-concurrent-index-builds.patch application/x-patch 9.3 KB
v1-0004-Support-snapshot-resets-in-concurrent-builds-of-u.patch application/x-patch 42.6 KB
v1-0002-Reset-snapshots-periodically-in-non-unique.patch application/x-patch 50.8 KB
v1-0003-Support-snapshot-resets-in-parallel-concurrent-in.patch application/x-patch 37.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-01-15 17:21:15 Re: how to gate experimental features (SQL/PGQ)
Previous Message Andrew Dunstan 2026-01-15 17:05:44 Re: how to gate experimental features (SQL/PGQ)