Re: Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations

From: Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Hannu Krosing <hannuk(at)google(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipkumarb(at)google(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>
Subject: Re: Patch: VACUUM should ignore (CREATE |RE)INDEX CONCURRENTLY for xmin horizon calculations
Date: 2025-11-27 02:30:00
Message-ID: CADzfLwXN4NXv8C+8GzbMJvRaBkJMs838c92CM-6Js-=Wpi5aRQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello!

> If you're interested in improving CIC and reducing its impact on
> visibility horizons, you may be interested in reviewing Mihail's work
> in [0].

Actually I reduced the scope of the patch to only
single-heap-scan-STIR-based-CIC. But patch files are still available.
I did it because it looks like we need first a solution which also
will work for REPACK concurrently, see [1].
Reset-snapshot technique may still be applied later though (to reduce
bloat in the target table).

> In that case VACUUM and prune operations would have to build
> visibility horizons for each table, and I don't think that's viable;
> especially so in a heavily partitioned workload. It'd also be
> prohibitively expensive to add a per-table Xmin -- we don't always
> know which tables will be accessed by any backend until they lock that
> table, but at that time they may already have a snapshot that they
> need tuples from. If we lazily included that backend, there may be
> visibility horizons and prune operations that were built and executed
> ahead of them notifying the other backends of their use, and pruned
> away still-visible tuples like in the aforementioned CIC hot pruning
> issue.

I think I know the way to implement it with almost zero cost for a
"normal" situation.
It is based on Antonin's idea to change
GetOldestNonRemovableTransactionId(rel) because we have Relation as an
argument.
We may put some "pinned" xmin into Relation and fill it only while
filling RelCache by accessing some shared memory.

Of course there is a lot of things we need to worry about:
* make sure every backend will respect it (invalidate cache and wait
for each backend to "know" about it)
* make sure horizon moved only forward
* handle all kind of error of backend who "pinned" horizon
* etc

In a few words, protocol of "pinning" is next:
* get a "normal snapshot"
* put its xmin into shared memory as newest horizon value possible for
a particular relation
* invalidate RelCache for that relation
* wait for all transaction having any locks on that relation to commit
(and, as result, drop old cached Relation)
* we a free to release snapshot and set some PROC flag to ignore our
xmin for data (not catalog) horizon
* GetOldestNonRemovableTransactionId(rel) will return
TransactionIdOlder(actual_horizon, pinned) taking "pinned" from `rel`.

If someone is interested - some dirty version is available at [1], but
it is better to wait for a more-less polished patchset.

[0]: https://www.postgresql.org/message-id/flat/202510301734.pj4uds3mqxx4%40alvherre.pgsql#fd20662912580a89b860790f9729aaff
[1]: https://github.com/michail-nikolaev/postgres/commits/make-PROC_IN_SAFE_IC-great-again/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2025-11-27 02:46:10 Re: Allow GUC settings in CREATE SUBSCRIPTION CONNECTION to take effect
Previous Message Mihail Nikalayeu 2025-11-27 01:54:00 Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY