| From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
|---|---|
| To: | Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com> |
| Cc: | Sergey Sargsyan <sergey(dot)sargsyan(dot)2001(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
| Subject: | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements |
| Date: | 2025-11-28 16:57:55 |
| Message-ID: | CAEze2WiiR2PeXg_vaURjjiiwvjQ=Um8wxWi1BcVS0BGyxiD2gQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, 28 Nov 2025 at 15:50, Mihail Nikalayeu
<mihailnikalayeu(at)gmail(dot)com> wrote:
>
> Hello!
>
> On Thu, Nov 27, 2025 at 9:07 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > While it might not break, and might not hold back other tables'
> > visibility horizons, it'll still hold back pruning on the table we're
> > acting on, and that's likely one which already had bloat issues if
> > you're running RIC (or REPACK).
>
> Yes, a good point about REPACK, agreed.
>
> BTW, what is about using the same reset snapshot technique for REPACK also?
>
> I thought it is impossible, but what if we:
>
> * while reading the heap we "remember" our current page position into
> shared memory
> * preserve all xmin/max/cid into newly created repacked table (we need
> it for MVCC-safe approach anyway)
> * in logical decoding layer - we check TID of our tuple and looking at
> "current page" we may correctly decide what to do with at apply phase:
>
> - if it in "non-yet read pages" - ignore (we will read it later) - but
> signal scan to ensure it will reset snapshot before that page
> (reset_before = min(reset_before, tid))
> - if it in "already read pages" - remember the apply operation (with
> exact target xmin/xmax and resulting xmin/xmax)
Yes, exactly - keep track of which snapshot was used for which part of
the table, and all updates that add/remove tuples from the scanned
range after that snapshot are considered inserts/deletes, similar to
how it'd work if LR had a filter on `ctid BETWEEN '(0, 0)' AND
'(end-of-snapshot-scan)'` which then gets updated every so often.
I'm a bit worried, though, that LR may lose updates due to commit
order differences between WAL and PGPROC. I don't know how that's
handled in logical decoding, and can't find much literature about it
in the repo either.
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)
| From | Date | Subject | |
|---|---|---|---|
| Next Message | KENAN YILMAZ | 2025-11-28 17:12:36 | Re: Bypassing cursors in postgres_fdw to enable parallel plans |
| Previous Message | Álvaro Herrera | 2025-11-28 16:28:27 | Re: IPC/MultixactCreation on the Standby server |