| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Rui Zhao <zhaorui126(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org, andres(at)anarazel(dot)de |
| Subject: | Re: Separate catalog_xmin from xmin in walsender hot standby feedback |
| Date: | 2026-06-08 11:29:44 |
| Message-ID: | CAA4eK1L4UMHwmtuD0ppTFvWbGQpHhBjxsuS-rSwVoiYQN=fYyw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jun 8, 2026 at 1:27 PM Rui Zhao <zhaorui126(at)gmail(dot)com> wrote:
>
> Attaching v2, rebased on top of current master (v1 no longer applied
> cleanly per cfbot). Two improvements were also made since v1:
>
> 1. Test stability
> The TAP test (053_hs_feedback_catalog_xmin.pl) replaced a sleep(2)
> with a poll on pg_stat_replication.reply_time, so it deterministically
> waits for a fresh hot standby feedback round to reach the primary
> instead of relying on wall-clock time. This removes the most obvious
> source of flakiness on slower CI runners.
>
> 2. Performance impact (pgbench)
> I was concerned about adding a UINT32_ACCESS_ONCE(proc->catalog_xmin)
> read and an extra TransactionIdOlder() call per PGPROC iteration in
> ComputeXidHorizons(), which is on a hot path. Measured on a 104-core
> box, scale=30, 64 clients / 32 jobs / 60s, -M prepared, optimized
> build (CFLAGS=-O2, no --enable-cassert):
>
> workload baseline median patched median delta
> ---------------------- --------------- -------------- -----
> pgbench -S (read-only) 1,596,196 1,617,083 +1.3%
> pgbench (read-write) 129,550 132,024 +1.9%
>
> Numbers are medians of 3 warm runs after a 30s warmup, after I
> discarded the cold-start runs which were dominated by cache priming.
> The patched build comes out marginally above baseline, which is of
> course noise -- adding code does not make things faster -- but it
> confirms the change is below the measurement floor on this hardware.
> Happy to rerun on different shapes (more clients, smaller scale,
> different machine) if anyone wants to see specific numbers.
>
> Summary of the change (unchanged in substance from v1):
>
> - proc.h: add catalog_xmin field to PGPROC (4 bytes)
> - proc.c: initialize it in InitProcess / InitAuxiliaryProcess
> - procarray.c: accumulate proc_catalog_xmin in ComputeXidHorizons()
> and apply it only to catalog_oldest_nonremovable and
> shared_oldest_nonremovable; include it in GetReplicationHorizons()
> so the catalog_xmin propagates correctly in cascading setups
> - walsender.c: in the no-slot path of ProcessStandbyHSFeedbackMessage(),
> set MyProc->xmin and MyProc->catalog_xmin separately instead of
> folding them together
>
I think storing catalog_xmin separately in PGPROC has few downsides
which are (a) it always needs additional four bytes in PGPROC which is
used for backend and other processes even though it is required for
walsender, (b) as both xmin and catalog_xmin are written separately
ComputeXidHorizons() could read one value as updated and other stale.
There may be something more fundamental which I may be missing but I
feel ephermal slots idea as hinted in comments is worth exploring.
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Akshay Joshi | 2026-06-08 11:30:32 | Re: [PATCH] Add pg_get_table_ddl() to reconstruct CREATE TABLE statements |
| Previous Message | Ilia Evdokimov | 2026-06-08 11:12:20 | Remove redundant DISTINCT when GROUP BY already guarantees uniqueness |