Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: 2024-03-07 18:36:53
Message-ID: CANtu0og_=ypCbH2ZFayn44i=CL0HAXKW390LfZhQ1F56HoFXtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

> I'm not a fan of this approach. Changing visibility and cleanup
> semantics to only benefit R/CIC sounds like a pain to work with in
> essentially all visibility-related code. I'd much rather have to deal
> with another index AM, even if it takes more time: the changes in
> semantics will be limited to a new plug in the index AM system and a
> behaviour change in R/CIC, rather than behaviour that changes in all
> visibility-checking code.

Technically, this does not affect the visibility logic, only the
clearing semantics.
All visibility related code remains untouched.
But yes, still an inelegant and a little strange-looking option.

At the same time, perhaps it can be dressed in luxury
somehow - for example, add as a first class citizen in ComputeXidHorizonsResult
a list of blocks to clear some relations.

> But regardless of second scan snapshots, I think we can worry about
> that part at a later moment: The first scan phase is usually the most
> expensive and takes the most time of all phases that hold snapshots,
> and in the above discussion we agreed that we can already reduce the
> time that a snapshot is held during that phase significantly. Sure, it
> isn't great that we have to scan the table again with only a single
> snapshot, but generally phase 2 doesn't have that much to do (except
> when BRIN indexes are involved) so this is likely less of an issue.
> And even if it is, we would still have reduced the number of
> long-lived snapshots by half.

Hmm, but it looks like we don't have the infrastructure to "update" xmin
propagating to the horizon after the first snapshot in a transaction is taken.

One option I know of is to reuse the
d9d076222f5b94a85e0e318339cfc44b8f26022d (1) approach.
But if this is the case, then there is no point in re-taking the
snapshot again during the first
phase - just apply this "if" only for the first phase - and you're done.

Do you know any less-hacky way? Or is it a nice way to go?

[1]: https://github.com/postgres/postgres/commit/d9d076222f5b94a85e0e318339cfc44b8f26022d#diff-8879f0173be303070ab7931db7c757c96796d84402640b9e386a4150ed97b179R1779-R1793

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-03-07 19:51:54 Re: Add system identifier to backup manifest
Previous Message Tom Lane 2024-03-07 18:10:53 Re: improve ssl error code, 2147483650