Re: Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: dmitry(dot)koterov(at)gmail(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Does having pg_last_wal_replay_lsn[replica] >= pg_current_wal_insert_lsn[master] guarantee that the replica is caught up?
Date: 2022-08-03 07:32:17
Message-ID: 20220803.163217.1789690807623885906.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm not sure this fits -hackers..

At Tue, 2 Aug 2022 18:57:41 -0700, Dmitry Koterov <dmitry(dot)koterov(at)gmail(dot)com> wrote in
> Hi.
>
> Suppose on master, I run a *multi-query* using PQexec and save the value
> returned by pg_current_wal_insert_lsn:
>
> master_lsn = query(master, "INSERT INTO some VALUES (...); SELECT
> pg_current_wal_insert_lsn()")
>
> Then I run a PQexec query on a replica and save the value returned by
> pg_last_wal_replay_lsn:
>
> replica_lsn = query(replica, "SELECT pg_last_wal_replay_lsn()")
>
> The question to experts in PG internals: *is it guaranteed that, as long as
> replica_lsn >= master_lsn (GREATER OR EQUAL, not just greater), then a
> subsequent read from replica will always return me the inserted record*
> (i.e. the replica is up to date), considering noone updates/deletes in that
> table?

https://www.postgresql.org/docs/devel/libpq-exec.html

> The command string can include multiple SQL commands (separated by
> semicolons). Multiple queries sent in a single PQexec call are
> processed in a single transaction, unless there are explicit
> BEGIN/COMMIT commands included in the query string to divide it into
> multiple transactions.

If the query() runs PQexec() with the same string, the call to
pg_current_wal_insert_lsn() is made before the insert is commited.
That behavior can be emulated on psql. (The backslash before semicolon
is crucial. It lets the connected queries be sent in a single
PQexec())

=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E5038
(1 row)
=# insert into t values(0)\; select pg_current_wal_lsn();
INSERT 0 1
pg_current_wal_lsn
--------------------
0/68E5038
(1 row)
=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E50A0
(1 row)

$ pg_waldump -s'0/68E5038' -e'0/68E50A0' $PGDATA/pg_wal/000000010000000000000006
rmgr: Heap len (rec/tot): 59/ 59, tx: 770, lsn: 0/068E5038, prev 0/068E5000, desc: INSERT off 15 flags 0x00, blkref #0: rel 1663/5/16424 blk 0
rmgr: Transaction len (rec/tot): 34/ 34, tx: 770, lsn: 0/068E5078, prev 0/068E5038, desc: COMMIT 2022-08-03 15:49:43.749158 JST

So, the replica cannot show the inserted data at the LSN the function
returned. If you explicitly ended transaction before
pg_current_wal_insert_lsn() call, the expected LSN would be returned.

=# select pg_current_wal_insert_lsn();
pg_current_wal_insert_lsn
---------------------------
0/68E75C8
(1 row)
=# begin\;insert into t values(0)\;commit\; select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/68E7958
$ pg_waldump -s'0/68E75C8' -e'0/68E7958' $PGDATA/pg_wal/000000010000000000000006
prev 0/068E7590, desc: INSERT off 22 flags 0x00, blkref #0: rel 1663/5/16424 blk 0 FPW
rmgr: Transaction len (rec/tot): 34/ 34, tx: 777, lsn: 0/068E7930, prev 0/068E75C8, desc: COMMIT 2022-08-03 16:09:13.516498 JST

> I'm asking, because according to some hints in the docs, this should be
> true. But for some reason, we have to use "greater" (not "greater or
> equals") condition in the real code, since with just ">=" the replica
> doesn't sometimes read the written data.

Thus the wrong LSN appears to have caused the behavior.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-08-03 07:49:24 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Previous Message Noah Misch 2022-08-03 07:28:47 Re: Race between KeepFileRestoredFromArchive() and restartpoint