Re: Commit Sequence Numbers and Visibility

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Commit Sequence Numbers and Visibility
Date: 2026-06-11 00:49:59
Message-ID: aa8bbd110167a11fe5ae908fd424f46a8c4e58f3.camel@j-davis.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2026-06-10 at 21:18 +0200, Matthias van de Meent wrote:
> I think it's desireable that snapshots don't need to take special
> care
> around the durability of the transactions that are included in their
> snapshot.
> Async transactions may want to see sync transactions' durable data,
> but probably prefer not to have to wait for that durability just
> because its session logged its latest commit record after the sync
> transaction did.

For the problem sequence with CSNs:

1. sync transaction T1 writes commit record
2. async transaction T2 writes commit record
3. T2 releases locks
4. T3 takes a snapshot
5. T1's commit record is flushed
6. T1 releases locks

there are three resolutions:

a. force T3 to wait until T1's commit record is flushed before
using the snapshot, slowing down the sync part of the
workload; or

b. force T2 to wait until all unflushed sync transactions with
an earlier commit LSN are flushed before releasing locks
(thereby making the above seqeunce impossible), slowing down
the async part of the workload; or

c. let T3 use the snapshot immediately, potentially returning
unflushed data to the client.

Perhaps none of those options is great for everyone, but we could allow
users to select the behavior they want. That seems better than today,
when any async transaction can cause sync transactions to start
returning unflushed data to the client, and there's no way to prevent
that.

> >
> Visibility would presumably happen after the recovery/replica has
> made
> sure that the durability of the pending-visible transactions is
> guaranteed; which presumably could be done with a sync wait at
> end-of-recovery.

That might be fine, but it's different from CSNs as I understand the
meaning (at least in the simplest and most intuitive meaning). Should
we call them "Visibility Sequence Numbers" (VSNs) instead?

For instance, given a sequence like (T1-T3 all sync transactions):

1. T1 writes commit record at LSN 123
2. T2 writes commit record at LSN 124
3. Flush to LSN 124
4. T2 writes commit-visibility record at LSN 125
5. T3 takes a snapshot with VSN=125 and returns data to client that
includes T2's changes but not T1's changes
6. crash

Recovery does not have enough information to know whether T1 or T2
became visible first, so we'd probably need to create a new "multi-
commit-visible" record that makes them all visible at the same LSN.
That would mean that the snapshot taken by T3 (that was externally
observed) could never exist in the recovered system, even though both
T1 and T2 exist.

That might have consequences for PITR. With recovery_target_xid, do you
recover up to its commit record or its commit-visibility record?

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2026-06-11 00:57:50 Re: DOCS - Add missing EXCEPT parameter description to ALTER PUBLICATION
Previous Message Jacob Champion 2026-06-11 00:49:47 Re: Having problems generating a code coverage report