Quick Links

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From:	Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>
To:	Hannu Krosing <hannuk(at)google(dot)com>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Sergey Sargsyan <sergey(dot)sargsyan(dot)2001(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>
Subject:	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date:	2025-11-28 20:41:15
Message-ID:	CADzfLwXL6cCksMeouTFhXcJJexCw5RqE6qCn79OL=P8BwoNjFw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi, Hannu!

I think you pressed "Reply" instead of "Reply All" - so, I put it to
the list (looks like nothing is secret here).
Mostly it is because of my opinion at the end of the mail which I want
to share with the list.

On Fri, Nov 28, 2025 at 8:33 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> If it is an *index AM* then this may not solve HOT chains issue (see
> below), if we put it on top of *table AM* as some kind of pass-through
> collector then likely yes, though you may still want to do final sort
> in commit order to know which one is the latest version of updated
> tuples which needs to go in the index. The latter is not strictly
> needed, but would be a nice optimisation for oft-updated rows.

It is AM which is added as an index (with the same
columns/expressions/predicates) to the table before phase 1 starts.
So, all new tuples are inserted into it.

> And I would not collect just TID, but also the indexes value, as else
> we end up accessing the table in some random order for getting the
> value (and possibly do visibility checks)
Just TIDs - it is ordered at validation phase (while merging with an
main index) and read using AIO - pretty fast.

> I am not sure where we decide that tuple is HOT-updatable, but I
> suspect that it is before we call any index AMs, so STIR ios not
> guaranteed to solve the issues with HOT chains.

I am not sure what the HOT-chains issue is, but it actually works
correctly already, including stress tests.
It is even merged into one commercial fork of PG (I am not affiliated
with it in any way).

> (And yes, I have a patch in works to include old and new tids> as part
> of logical decoding - they are "almost there", just not passed through
> - which would help here too to easily keep just the last value)

Yes, at least it is required for the REPACK case.

But....

Antonin already has a prototype of patch to enable logical decoding
for all kinds of tables in [0] (done in scope of REPACK).

So, if we have such mechanics in place, it looks nice (and almost the
same) for both CIC and REPACK:
* in both cases we create temporary slot to collect incoming tuples
* in both cases scan the table resetting snapshot every few pages to
keep xmin horizon propagate
* in both cases the process already collected part every few megabytes
* just the logic of using collected tuples is different...

So, yes, from terms of effectiveness STIR seems to be better, but such
a common approach like LD looks tempting to have for both REPACK/CIC.

On Fri, Nov 28, 2025 at 5:58 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> -1: Requiring the logical decoding system just to reindex an index
> without O(tablesize) lock time adds too much overhead,

How big is the additional cost of maintaining logical decoding for a
table? Could you please evolve a little bit?

Best regards,
Mikhail.

[0]: https://www.postgresql.org/message-id/152010.1751307725%40localhost
(v15-0007-Enable-logical-decoding-transiently-only-for-REPACK-.patch)

In response to

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements at 2025-11-28 19:08:12 from Hannu Krosing

Responses

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements at 2025-11-28 21:01:04 from Hannu Krosing

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Álvaro Herrera	2025-11-28 20:51:44	Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY
Previous Message	Tom Lane	2025-11-28 20:40:19	Re: pgsql: Inline pg_ascii_tolower() and pg_ascii_toupper().