Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: 2025-11-28 09:51:16
Message-ID: CAEze2WjE=LJ_m_uWFDHgwXaxvqN-wna7A40=XWJ4727NpmTQ2w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 28 Nov 2025 at 10:05, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> wrote:
> > 3. Having the index marked indisready before it contains any data is
> > going to slow down the indexing process significantly:
> > a. The main index build now must go through shared memory and buffer
> > locking, instead of being able to use backend-local memory
> > b. The tuple-wise insertion path (IndexAmRoutine->aminsert) can have a
> > significantly higher overhead than the bulk insertion logic in
> > ambuild(); in metrics of WAL, pages accessed (IO), and CPU cycles
> > spent.
> >
> > So, I don't think moving away from ambuild() as basis for initially
> > building the index this is such a great idea.
> >
> > (However, I do think that having an _option_ to build the index using
> > ambuildempty()+aminsert() instead of ambuild() might be useful, if
> > only to more easily compare "natural grown" indexes vs freshly built
> > ones, but that's completely orthogonal to CIC snapshotting
> > improvements.)
>
> The retail insertions are not something this proposal depends on. I think it'd
> be possible to build a separate index locally and then "merge" it with the
> regular one. I just tried to propose a solution that does not need snapshots.

I'm not sure we can generalize indexes to the point where merging two
built indexes is always both possible and efficient.

For example, the ANN indexes of pgvector (both HNSW and IVF) could
possibly have merge operations between indexes of the same type and
schema, but it would require a lot of effort on the side of the AM to
support merging; there is no trivial merge operation that also retains
the quality of the index without going through the aminsert() path.
Conversely, the current approach to CIC doesn't require additional
work on the index AM's side, and that's a huge enabler for every kind
of index.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-11-28 10:00:48 Re: How can end users know the cause of LR slot sync delays?
Previous Message Chao Li 2025-11-28 09:39:23 Re: Proposal: adding --enable-shadows-warning