| From: | Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com> |
|---|---|
| To: | Hannu Krosing <hannuk(at)google(dot)com> |
| Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Sergey Sargsyan <sergey(dot)sargsyan(dot)2001(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
| Subject: | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements |
| Date: | 2025-11-28 18:40:45 |
| Message-ID: | CADzfLwX7u5P=Utt4kj-KOjgC_8fhsinpmLtadqkHrohE5Y5H+Q@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello!
On Fri, Nov 28, 2025 at 7:05 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> 1. While the first pass of CIC is collecting the visible tuple for
> index the logical decoding collector also collects any new tuples
> added after the CIC start.
> 2. When the first pass collection finishes, it also gets the indexes
> collected so far by the logical decoding collectoir and adds them to
> the first set before the sorting and creating the index.
>
> 3. once the initial index is created, the CIC just gets whatever else
> was collected after 2. and adds these to the index
It feels very similar to the approach with STIR (upper in that thread)
- instead of doing the second scan - just collect all the new-coming
TIDs in short-term-index-replacement access method.
I think STIR lightweight AM (contains just TID) is a better option
here than logical replication due several reason (Mathias already
mentioned some of them).
Anyway, it looks like things\threads became a little bit mixed-up,
I'll try to structure it a little bit.
For CIC/RC approach with resetting snapshot during heap scan - it is
enough to achieve vacuum-friendly state in phase 1.
For phase 2 (validation) - we need an additional thing - something to
collect incoming tuples (STIR index AM is proposed). In that case we
achieve vacuum-friendly for both phases + single heap scan.
STIR at the same time may be used as just way to make CIC faster
(single scan) - without any improvements related to VACUUM.
You may check [0] for links.
Another topic is REPACK CONCURRENTLY, which itself leaves in [1]. It
is already based on LR.
I was talking about a way to use the same tech (reset snapshot during
the scan) for REPACK also, leveraging the already introduced LR
decoding part.
Mikhail.
[0]: https://www.postgresql.org/message-id/flat/CADzfLwWkYi3r-CD_Bbkg-Mx0qxMBzZZFQTL2ud7yHH2KDb1hdw%40mail.gmail.com
[1]: https://www.postgresql.org/message-id/flat/202507262156.sb455angijk6%40alvherre.pgsql
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mihail Nikalayeu | 2025-11-28 18:42:31 | Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY |
| Previous Message | Andres Freund | 2025-11-28 18:39:19 | Re: Remove unused function parameters, part 2: replication |