Re: Add last_commit_lsn to pg_stat_database

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: Add last_commit_lsn to pg_stat_database
Date: 2024-03-07 17:30:35
Message-ID: ed034d76-3070-4968-a102-ca8655d19f33@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I've previously noted in "Add last commit LSN to
> pg_last_committed_xact()" [1] that it's not possible to monitor how
> many bytes of WAL behind a logical replication slot is (computing such
> is obviously trivial for physical slots) because the slot doesn't need
> to replicate beyond the last commit. In some cases it's possible for
> the current WAL location to be far beyond the last commit. A few
> examples:
>
> - An idle server with checkout_timeout at a lower value (so lots of
> empty WAL advance).
> - Very large transactions: particularly insidious because committing a
> 1 GB transaction after a small transaction may show almost zero time
> lag even though quite a bit of data needs to be processed and sent
> over the wire (so time to replay is significantly different from
> current lag).
> - A cluster with multiple databases complicates matters further,
> because while physical replication is cluster-wide, the LSNs that
> matter for logical replication are database specific.
>
>
> Since we don't expose the most recent commit's LSN there's no way to
> say "the WAL is currently 1250, the last commit happened at 1000, the
> slot has flushed up to 800, therefore there are at most 200 bytes
> replication needs to read through to catch up.

I'm not sure I fully understand the problem. What are you doing
currently to measure the lag? If you look at pg_replication_slots today,
confirmed_flush_lsn advances also when you do pg_switch_wal(), so
looking at the diff between confirmed_flush_lsn and pg_current_wal_lsn()
works, no?

And on the other hand, even if you expose the database's last commit
LSN, you can have an publication that includes only a subset of tables.
Or commits that don't write to any table at all. So I'm not sure why the
database's last commit LSN is relevant. Getting the last LSN that did
something that needs to be replicated through the publication might be
useful, but that's not what what this patch does.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-03-07 17:32:35 Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).
Previous Message Noah Misch 2024-03-07 17:28:59 Re: 035_standby_logical_decoding unbounded hang