| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Antonin Houska <ah(at)cybertec(dot)at> |
| Cc: | Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net> |
| Subject: | Re: Adding REPACK [concurrently] |
| Date: | 2026-05-10 11:31:04 |
| Message-ID: | CAA4eK1KC6CGN-N2bUffSign8Sw4q6=8d3L-Xh4t+50GCdQb6zw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, May 5, 2026 at 6:17 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> I think the problem is that with database-specific snapshot,
> SnapBuildProcessRunningXacts() returns early, w/o adjusting builder->xmin
>
> /*
> * Database specific transaction info may exist to reach CONSISTENT state
> * faster, however the code below makes no use of it. Moreover, such
> * record might cause problems because the following normal (cluster-wide)
> * record can have lower value of oldestRunningXid. In that case, let's
> * wait with the cleanup for the next regular cluster-wide record.
> */
> if (OidIsValid(running->dbid))
> return;
>
> and thus some transactions whose XID is below running->oldestRunningXid may
> continue to be incorrectly considered running.
>
> I originally thought that this should not happen because such transactions
> will be added to the builder's array of committed transactions by
> SnapBuildCommitTxn() anyway. However, I failed to notice that COMMIT record of
> a transaction listed in the xl_running_xacts WAL record is not guaranteed to
> follow the xl_running_xacts record in WAL. In other words, even if
> xl_running_xacts is created before a COMMIT record of the contained
> transaction, it may end up at higher LSN in WAL. So the cleanup I relied on
> might not take place.
>
BTW, is it possible to write a test by using injection_points or via
manual steps (by using debugger, etc) so that we can more clearly
understand this problem and proposed fix?
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2026-05-10 12:53:17 | Re: Fix wrong error message from pg_get_tablespace_ddl() |
| Previous Message | Amit Kapila | 2026-05-10 11:24:29 | Re: Adding REPACK [concurrently] |