Re: persist logical slots to disk during shutdown checkpoint

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: persist logical slots to disk during shutdown checkpoint
Date: 2023-08-27 18:32:28
Message-ID: CALDaNm3jY-xiDhEoJwTYHiBWiY+4DpNjZbddNozE7ZyOXKLvFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 25 Aug 2023 at 17:40, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sat, 19 Aug 2023 at 11:53, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > It's entirely possible for a logical slot to have a confirmed_flush
> > LSN higher than the last value saved on disk while not being marked as
> > dirty. It's currently not a problem to lose that value during a clean
> > shutdown / restart cycle but to support the upgrade of logical slots
> > [1] (see latest patch at [2]), we seem to rely on that value being
> > properly persisted to disk. During the upgrade, we need to verify that
> > all the data prior to shudown_checkpoint for the logical slots has
> > been consumed, otherwise, the downstream may miss some data. Now, to
> > ensure the same, we are planning to compare the confirm_flush LSN
> > location with the latest shudown_checkpoint location which means that
> > the confirm_flush LSN should be updated after restart.
> >
> > I think this is inefficient even without an upgrade because, after the
> > restart, this may lead to decoding some data again. Say, we process
> > some transactions for which we didn't send anything downstream (the
> > changes got filtered) but the confirm_flush LSN is updated due to
> > keepalives. As we don't flush the latest value of confirm_flush LSN,
> > it may lead to processing the same changes again.
>
> I was able to test and verify that we were not processing the same
> changes again.
> Note: The 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
> has logs to print if a decode transaction is skipped and also a log to
> mention if any operation is filtered.
> The test.sh script has the steps for a) setting up logical replication
> for a table b) perform insert on table that need to be published (this
> will be replicated to the subscriber) c) perform insert on a table
> that will not be published (this insert will be filtered, it will not
> be replicated) d) sleep for 5 seconds e) stop the server f) start the
> server
> I used the following steps, do the following in HEAD:
> a) Apply 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
> patch in Head and build the binaries b) execute test.sh c) view N1.log
> file to see that the insert operations were filtered again by seeing
> the following logs:
> LOG: Filter insert for table tbl2
> ...
> ===restart===
> ...
> LOG: Skipping transaction 0/156AD10 as start decode at is greater 0/156AE40
> ...
> LOG: Filter insert for table tbl2
>
> We can see that the insert operations on tbl2 which was filtered
> before server was stopped is again filtered after restart too in HEAD.
>
> Lets see that the same changes were not processed again with patch:
> a) Apply v4-0001-Persist-to-disk-logical-slots-during-a-shutdown-c.patch
> from [1] also apply
> 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch patch
> and build the binaries b) execute test.sh c) view N1.log file to see
> that the insert operations were skipped after restart of server by
> seeing the following logs:
> LOG: Filter insert for table tbl2
> ...
> ===restart===
> ...
> Skipping transaction 0/156AD10 as start decode at is greater 0/156AFB0
> ...
> Skipping transaction 0/156AE80 as start decode at is greater 0/156AFB0
>
> We can see that the insert operations on tbl2 are not processed again
> after restart with the patch.

Here is another way to test using pg_replslotdata approach that was
proposed earlier at [1].
I have rebased this on top of HEAD and the v5 version for the same is attached.

We can use the same test as test.sh shared at [2].
When executed with HEAD, it was noticed that confirmed_flush points to
WAL location before both the transaction:
slot_name slot_type datoid persistency xmin catalog_xmin
restart_lsn confirmed_flush two_phase_at two_phase
plugin
--------- --------- ------ ---------- ----
----------- ----------- ---------------
------------ --------- ------
sub logical 5 persistent 0
735 0/1531E28 0/1531E60 0/0
0 pgoutput

WAL record information generated using pg_walinspect for various
records at and after confirmed_flush WAL 0/1531E60:
row_number | start_lsn | end_lsn | prev_lsn | xid |
resource_manager | record_type | record_length |
main_data_length | fpi_length |
description
|
block_ref
------------+-----------+-----------+-----------+-----+------------------+---------------------+---------------+------------------+------------+-------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------
1 | 0/1531E60 | 0/1531EA0 | 0/1531E28 | 0 | Heap2
| PRUNE | 57 | 9 |
0 | snapshotConflictHorizon: 0, nredirected: 0, ndead: 1, nunused: 0,
redirected: [], dead: [1], unused: []
|
blkref #0: rel 1663/5/1255 fork main blk 58
2 | 0/1531EA0 | 0/1531EE0 | 0/1531E60 | 735 | Heap
| INSERT+INIT | 59 | 3 |
0 | off: 1, flags: 0x08

|
blkref #0: rel 1663/5/16384 fork main blk 0
3 | 0/1531EE0 | 0/1531F20 | 0/1531EA0 | 735 | Heap
| INSERT | 59 | 3 |
0 | off: 2, flags: 0x08

|
blkref #0: rel 1663/5/16384 fork main blk 0
4 | 0/1531F20 | 0/1531F60 | 0/1531EE0 | 735 | Heap
| INSERT | 59 | 3 |
0 | off: 3, flags: 0x08

|
blkref #0: rel 1663/5/16384 fork main blk 0
5 | 0/1531F60 | 0/1531FA0 | 0/1531F20 | 735 | Heap
| INSERT | 59 | 3 |
0 | off: 4, flags: 0x08

|
blkref #0: rel 1663/5/16384 fork main blk 0
6 | 0/1531FA0 | 0/1531FE0 | 0/1531F60 | 735 | Heap
| INSERT | 59 | 3 |
0 | off: 5, flags: 0x08

|
blkref #0: rel 1663/5/16384 fork main blk 0
7 | 0/1531FE0 | 0/1532028 | 0/1531FA0 | 735 | Transaction
| COMMIT | 46 | 20 |
0 | 2023-08-27 23:22:17.161215+05:30

|
8 | 0/1532028 | 0/1532068 | 0/1531FE0 | 736 | Heap
| INSERT+INIT | 59 | 3 |
0 | off: 1, flags: 0x08

|
blkref #0: rel 1663/5/16387 fork main blk 0
9 | 0/1532068 | 0/15320A8 | 0/1532028 | 736 | Heap
| INSERT | 59 | 3 |
0 | off: 2, flags: 0x08

|
blkref #0: rel 1663/5/16387 fork main blk 0
10 | 0/15320A8 | 0/15320E8 | 0/1532068 | 736 | Heap
| INSERT | 59 | 3 |
0 | off: 3, flags: 0x08

|
blkref #0: rel 1663/5/16387 fork main blk 0
11 | 0/15320E8 | 0/1532128 | 0/15320A8 | 736 | Heap
| INSERT | 59 | 3 |
0 | off: 4, flags: 0x08

|
blkref #0: rel 1663/5/16387 fork main blk 0
12 | 0/1532128 | 0/1532168 | 0/15320E8 | 736 | Heap
| INSERT | 59 | 3 |
0 | off: 5, flags: 0x08

|
blkref #0: rel 1663/5/16387 fork main blk 0
13 | 0/1532168 | 0/1532198 | 0/1532128 | 736 | Transaction
| COMMIT | 46 | 20 |
0 | 2023-08-27 23:22:17.174756+05:30

|
14 | 0/1532198 | 0/1532210 | 0/1532168 | 0 | XLOG
| CHECKPOINT_SHUTDOWN | 114 | 88 |
0 | redo 0/1532198; tli 1; prev tli 1; fpw true; xid 0:737; oid 16399;
multi 1; offset 0; oldest xid 723 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 0;
shutdown |

Whereas the same test executed with the patch applied shows that
confirmed_flush points to CHECKPOINT_SHUTDOWN record:
slot_name slot_type datoid persistency xmin catalog_xmin
restart_lsn confirmed_flush two_phase_at two_phase
plugin
--------- --------- ------ ----------- ---
----------- ----------- ---------------
----------- --------- ------
sub logical 5 persistent 0 735
0/1531E28 0/1532198 0/0 0
pgoutput

WAL record information generated using pg_walinspect for various
records at and after confirmed_flush WAL 0/1532198:
row_number | start_lsn | end_lsn | prev_lsn | xid |
resource_manager | record_type | record_length |
main_data_length | fpi_length |
description
|
block_ref
------------+-----------+-----------+-----------+-----+------------------+---------------------+---------------+------------------+------------+-------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------+-----------
1 | 0/1532198 | 0/1532210 | 0/1532168 | 0 | XLOG
| CHECKPOINT_SHUTDOWN | 114 | 88 |
0 | redo 0/1532198; tli 1; prev tli 1; fpw true; xid 0:737; oid 16399;
multi 1; offset 0; oldest xid 723 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 0;
shutdown |
(1 row)

[1] - https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm2BboFuFVYxyzP4wkv7%3D8%2B_TwsD%2BugyGhtibTSF4m4XRg%40mail.gmail.com

Regards,
Vignesh

Attachment Content-Type Size
v5-0001-pg_replslotdata.patch text/x-patch 22.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-08-27 18:34:07 Re: list of acknowledgments for PG16
Previous Message Yugo NAGATA 2023-08-27 17:49:08 Re: Incremental View Maintenance, take 2