Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Hannu Krosing <hannuk(at)google(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Antonin Houska <ah(at)cybertec(dot)at>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Sergey Sargsyan <sergey(dot)sargsyan(dot)2001(at)gmail(dot)com>, alvherre(at)kurilemu(dot)de, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: 2025-12-17 18:22:56
Message-ID: CAEze2WjjL2Uee7ZZzFr+sDe9jdFtaoX6TWdry6NRvGr2FX0ASw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 16 Dec 2025 at 14:43, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> Summary of CIC as it is today
> -----------------------------
>
> To recap, the CIC approach at very high level is:
>
> 1. Build the index, while backends are modifying the table concurrently
>
> 2. Retail insert all the tuples that we missed in step 1.
>
> A lot of logic and coordination goes into determining what was missed in
> step 1. Currently, it involves snapshots, waiting for concurrent
> transactions to finish, and re-scanning the index and the table.
>
> The STIR idea is to maintain a little data structure on the side where
> we collect items that are inserted between steps 1 and 2, to avoid
> re-scanning the table.

During step 1, up to step 2, indeed.

> Shmem struct
> ------------
>
> One high-level observation:
>
> We're using the catalog for inter-process communication, with the
> indisready and indisvalid flags, and now with STIR by having a special,
> ephemeral index AM. That feels unnecessarily difficult. I propose that
> we introduce a little shared memory struct to keep track of in-progress
> CONCURRENTLY index builds.
>
> In the first transaction that inserts the catalog entry with
> indisready=false, also create a shmem struct. In that struct, we can
> store information about what state the build is in, and whether
> insertions should go to the STIR or to the real index.
>
> Avoid one wait-for-all-transactions step using the shmem struct
> ---------------------------------------------------------------
>
> As one small incremental improvement, we could use the shmem struct to
> avoid one of the "wait for all transactions" steps in the current
> implementation. In validate_index(), after we mark the index as
> 'indisready' we have to wait for all transactions to finish, to ensure
> that all subsequent insertions have seen the indisready=true change. We
> could avoid that by setting a flag in the shmem struct instead, so that
> all backends would see instantly that the flag is flipped.
>
>
> Improved STIR approach
> ----------------------
>
> Here's another proposal using the STIR approach. It's a little different
> from the patches so far:
> [many steps]

I am not convinced that this new approach is correct, as it introduces
too many new moving components into concurent index creation. I'm
quite concerned about the correctness around snapshot xmax checks:
while the approach is in a different system than that of PG14.0's CIC
changes, I can't help but think that this will open another can of
worms that is similar to that bug; it also changes expectations about
snapshot contents (by conditionally including tuples in the
"visibility" of STIR data structure), and I'm not even sure that it
guarantees that STIR contains all possibly-visible-after-CIC tuples
that aren't visible in the snapshot(s) of the main table scan.

So, let's not complicate these changes more than what they already are.

> Snapshot refreshing
> -------------------
>
> The above proposal doesn't directly accomplish the original goal of
> advancing the global xmin horizon. You still need two long-lived
> snapshots. It does however make CIC faster, by eliminating the full
> index scan and table scan in the validate_index() stage. That already
> helps a little.
>
> I believe it can be extended to also advance xmin horizon:
>
> - In step 4, while we are building the index, we can periodically get a
> new snapshot, update the cutoff in the shmem struct, and drain the STIR
> of the tuples that are already in it.

I'm against periodic draining of STIR:

1. With these periodic index scans, each heap page may be accessed
many more times than the current 2 times, as heap pages may be updated
at least once every time STIR is drained (up to MaxHeapTuplesPerPage);
thus strictly increasing the heap IO requirement compared to only
using STIR at phase 2.
2. The STIR index will contain TIDs of pages we have yet to scan; we'd
have to filter these out if we want to prevent duplicate TID insertion
(or we would risk having the STIR TID visible in an upcoming
visibility snapshot and inserting it twice).
3. The STIR would contain TIDs that may already be dead by the end of
the first phase, scanning the STIR index early could mean we'd still
have added it to the index.
4. The current approach to re-snapshotting happens completely inside
the table AM. Adding STIR draining to this phase would require
completely new code inside tableAMs or inside index AMs. It doesn't
make much sense for a tableAM to know about this.

All together I think those drawbacks for periodically draining STIR
are too significant to consider right now.

> - In step 7, we can take a new snapshot as often as we like. The
> snapshot is only used to evaluate expressions.

We shouldn't try to insert dead tuples into the index, so shouldn't we
also use some visibility checks in that step?

Kind regards,

Matthias van de Meent

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirill Reshke 2025-12-17 18:27:05 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message Andres Freund 2025-12-17 17:55:33 Re: Updating IPC::Run in CI?