| From: | Hannu Krosing <hannuk(at)google(dot)com> |
|---|---|
| To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
| Cc: | Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Sergey Sargsyan <sergey(dot)sargsyan(dot)2001(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
| Subject: | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements |
| Date: | 2025-11-28 19:08:12 |
| Message-ID: | CAMT0RQR7J+mQ=3ufSVV5VhChrhaSGgYSk2Zwe9e=_EOc-v-cwQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Nov 28, 2025 at 7:31 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
>
> On Fri, 28 Nov 2025 at 18:58, Hannu Krosing <hannuk(at)google(dot)com> wrote:
> >
> > On Fri, Nov 28, 2025 at 5:58 PM Matthias van de Meent
> > <boekewurm+postgres(at)gmail(dot)com> wrote:
> > >
> > ...
> > > I'm a bit worried, though, that LR may lose updates due to commit
> > > order differences between WAL and PGPROC. I don't know how that's
> > > handled in logical decoding, and can't find much literature about it
> > > in the repo either.
> >
> > Now the reference to logical decoding made me think that maybe to real
> > fix for CIC would be to leverage logical decoding for the 2nd pass of
> > CIC and not wore about in-page visibilities at all.
>
> -1: Requiring the logical decoding system just to reindex an index
> without O(tablesize) lock time adds too much overhead, and removes
> features we currently have (CIC on unlogged tables). wal_level=logical
> *must not* be required for these tasks if we can at all avoid it.
> I'm also not sure whether logical decoding gets access to the HOT
> information of the updated tuples involved, and therefore whether the
> index build can determine whether it must or can't insert the tuple.
There are more and more cases (not just CIC here) where using logical
decoding would be the most efficient solution, so why not instead
start improving it instead of complicating the system in various
places?
We could even start selectively logging UNLOGGED and TEMP tables when
we start CIC if CIC has enough upsides.
> I don't think logical decoding is sufficient, because we don't know
> which tuples were already inserted into the index by their own
> backends, so we don't know which tuples' index entries we must skip.
The premise of pass2 in CIC is that we collect all the rows that were
inserted after CIC started for which we are not 100% sure that they
are inserted in the index. We can only be sure they are inserted for
transactions started after pass1 completed and the index became
visible and available for inserts.
I am sure that it is possible to avoid inserting duplicate entry (same
value and tid) at insert time.
And we do not care about hot update chains dusing normal CREATE INDEX
or first pass of CIC - we just index what is visible NOW wit no regard
of weather the tuple is at the end of HOT update chain.
> Kind regards,
>
> Matthias van de Meent.
>
> PS. I think the same should be true for REPACK CONCURRENTLY, but
> that's a new command with yet-to-be-determined semantics, unlike CIC
> which has been part of PG for 6 years.
CIC has been around way longer, since 8.2 released in 2006, so more
like 20 years :)
---
Cheers
Hannu
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2025-11-28 19:21:03 | Re: should we have a fast-path planning for OLTP starjoins? |
| Previous Message | Robert Haas | 2025-11-28 18:57:14 | Re: should we have a fast-path planning for OLTP starjoins? |