From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: persist logical slots to disk during shutdown checkpoint |
Date: | 2023-08-27 18:32:28 |
Message-ID: | CALDaNm3jY-xiDhEoJwTYHiBWiY+4DpNjZbddNozE7ZyOXKLvFg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 25 Aug 2023 at 17:40, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sat, 19 Aug 2023 at 11:53, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > It's entirely possible for a logical slot to have a confirmed_flush
> > LSN higher than the last value saved on disk while not being marked as
> > dirty. It's currently not a problem to lose that value during a clean
> > shutdown / restart cycle but to support the upgrade of logical slots
> > [1] (see latest patch at [2]), we seem to rely on that value being
> > properly persisted to disk. During the upgrade, we need to verify that
> > all the data prior to shudown_checkpoint for the logical slots has
> > been consumed, otherwise, the downstream may miss some data. Now, to
> > ensure the same, we are planning to compare the confirm_flush LSN
> > location with the latest shudown_checkpoint location which means that
> > the confirm_flush LSN should be updated after restart.
> >
> > I think this is inefficient even without an upgrade because, after the
> > restart, this may lead to decoding some data again. Say, we process
> > some transactions for which we didn't send anything downstream (the
> > changes got filtered) but the confirm_flush LSN is updated due to
> > keepalives. As we don't flush the latest value of confirm_flush LSN,
> > it may lead to processing the same changes again.
>
> I was able to test and verify that we were not processing the same
> changes again.
> Note: The 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
> has logs to print if a decode transaction is skipped and also a log to
> mention if any operation is filtered.
> The test.sh script has the steps for a) setting up logical replication
> for a table b) perform insert on table that need to be published (this
> will be replicated to the subscriber) c) perform insert on a table
> that will not be published (this insert will be filtered, it will not
> be replicated) d) sleep for 5 seconds e) stop the server f) start the
> server
> I used the following steps, do the following in HEAD:
> a) Apply 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch
> patch in Head and build the binaries b) execute test.sh c) view N1.log
> file to see that the insert operations were filtered again by seeing
> the following logs:
> LOG: Filter insert for table tbl2
> ...
> ===restart===
> ...
> LOG: Skipping transaction 0/156AD10 as start decode at is greater 0/156AE40
> ...
> LOG: Filter insert for table tbl2
>
> We can see that the insert operations on tbl2 which was filtered
> before server was stopped is again filtered after restart too in HEAD.
>
> Lets see that the same changes were not processed again with patch:
> a) Apply v4-0001-Persist-to-disk-logical-slots-during-a-shutdown-c.patch
> from [1] also apply
> 0001-Add-logs-to-skip-transaction-filter-insert-operation.patch patch
> and build the binaries b) execute test.sh c) view N1.log file to see
> that the insert operations were skipped after restart of server by
> seeing the following logs:
> LOG: Filter insert for table tbl2
> ...
> ===restart===
> ...
> Skipping transaction 0/156AD10 as start decode at is greater 0/156AFB0
> ...
> Skipping transaction 0/156AE80 as start decode at is greater 0/156AFB0
>
> We can see that the insert operations on tbl2 are not processed again
> after restart with the patch.
Here is another way to test using pg_replslotdata approach that was
proposed earlier at [1].
I have rebased this on top of HEAD and the v5 version for the same is attached.
We can use the same test as test.sh shared at [2].
When executed with HEAD, it was noticed that confirmed_flush points to
WAL location before both the transaction:
slot_name slot_type datoid persistency xmin catalog_xmin
restart_lsn confirmed_flush two_phase_at two_phase
plugin
--------- --------- ------ ---------- ----
----------- ----------- ---------------
------------ --------- ------
sub logical 5 persistent 0
735 0/1531E28 0/1531E60 0/0
0 pgoutput
WAL record information generated using pg_walinspect for various
records at and after confirmed_flush WAL 0/1531E60:
row_number | start_lsn | end_lsn | prev_lsn | xid |
resource_manager | record_type | record_length |
main_data_length | fpi_length |
description
|
block_ref
------------+-----------+-----------+-----------+-----+------------------+---------------------+---------------+------------------+------------+-------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------
1 | 0/1531E60 | 0/1531EA0 | 0/1531E28 | 0 | Heap2
| PRUNE | 57 | 9 |
0 | snapshotConflictHorizon: 0, nredirected: 0, ndead: 1, nunused: 0,
redirected: [], dead: [1], unused: []
|
blkref #0: rel 1663/5/1255 fork main blk 58
2 | 0/1531EA0 | 0/1531EE0 | 0/1531E60 | 735 | Heap
| INSERT+INIT | 59 | 3 |
0 | off: 1, flags: 0x08
|
blkref #0: rel 1663/5/16384 fork main blk 0
3 | 0/1531EE0 | 0/1531F20 | 0/1531EA0 | 735 | Heap
| INSERT | 59 | 3 |
0 | off: 2, flags: 0x08
|
blkref #0: rel 1663/5/16384 fork main blk 0
4 | 0/1531F20 | 0/1531F60 | 0/1531EE0 | 735 | Heap
| INSERT | 59 | 3 |
0 | off: 3, flags: 0x08
|
blkref #0: rel 1663/5/16384 fork main blk 0
5 | 0/1531F60 | 0/1531FA0 | 0/1531F20 | 735 | Heap
| INSERT | 59 | 3 |
0 | off: 4, flags: 0x08
|
blkref #0: rel 1663/5/16384 fork main blk 0
6 | 0/1531FA0 | 0/1531FE0 | 0/1531F60 | 735 | Heap
| INSERT | 59 | 3 |
0 | off: 5, flags: 0x08
|
blkref #0: rel 1663/5/16384 fork main blk 0
7 | 0/1531FE0 | 0/1532028 | 0/1531FA0 | 735 | Transaction
| COMMIT | 46 | 20 |
0 | 2023-08-27 23:22:17.161215+05:30
|
8 | 0/1532028 | 0/1532068 | 0/1531FE0 | 736 | Heap
| INSERT+INIT | 59 | 3 |
0 | off: 1, flags: 0x08
|
blkref #0: rel 1663/5/16387 fork main blk 0
9 | 0/1532068 | 0/15320A8 | 0/1532028 | 736 | Heap
| INSERT | 59 | 3 |
0 | off: 2, flags: 0x08
|
blkref #0: rel 1663/5/16387 fork main blk 0
10 | 0/15320A8 | 0/15320E8 | 0/1532068 | 736 | Heap
| INSERT | 59 | 3 |
0 | off: 3, flags: 0x08
|
blkref #0: rel 1663/5/16387 fork main blk 0
11 | 0/15320E8 | 0/1532128 | 0/15320A8 | 736 | Heap
| INSERT | 59 | 3 |
0 | off: 4, flags: 0x08
|
blkref #0: rel 1663/5/16387 fork main blk 0
12 | 0/1532128 | 0/1532168 | 0/15320E8 | 736 | Heap
| INSERT | 59 | 3 |
0 | off: 5, flags: 0x08
|
blkref #0: rel 1663/5/16387 fork main blk 0
13 | 0/1532168 | 0/1532198 | 0/1532128 | 736 | Transaction
| COMMIT | 46 | 20 |
0 | 2023-08-27 23:22:17.174756+05:30
|
14 | 0/1532198 | 0/1532210 | 0/1532168 | 0 | XLOG
| CHECKPOINT_SHUTDOWN | 114 | 88 |
0 | redo 0/1532198; tli 1; prev tli 1; fpw true; xid 0:737; oid 16399;
multi 1; offset 0; oldest xid 723 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 0;
shutdown |
Whereas the same test executed with the patch applied shows that
confirmed_flush points to CHECKPOINT_SHUTDOWN record:
slot_name slot_type datoid persistency xmin catalog_xmin
restart_lsn confirmed_flush two_phase_at two_phase
plugin
--------- --------- ------ ----------- ---
----------- ----------- ---------------
----------- --------- ------
sub logical 5 persistent 0 735
0/1531E28 0/1532198 0/0 0
pgoutput
WAL record information generated using pg_walinspect for various
records at and after confirmed_flush WAL 0/1532198:
row_number | start_lsn | end_lsn | prev_lsn | xid |
resource_manager | record_type | record_length |
main_data_length | fpi_length |
description
|
block_ref
------------+-----------+-----------+-----------+-----+------------------+---------------------+---------------+------------------+------------+-------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------+-----------
1 | 0/1532198 | 0/1532210 | 0/1532168 | 0 | XLOG
| CHECKPOINT_SHUTDOWN | 114 | 88 |
0 | redo 0/1532198; tli 1; prev tli 1; fpw true; xid 0:737; oid 16399;
multi 1; offset 0; oldest xid 723 in DB 1; oldest multi 1 in DB 1;
oldest/newest commit timestamp xid: 0/0; oldest running xid 0;
shutdown |
(1 row)
[1] - https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm2BboFuFVYxyzP4wkv7%3D8%2B_TwsD%2BugyGhtibTSF4m4XRg%40mail.gmail.com
Regards,
Vignesh
Attachment | Content-Type | Size |
---|---|---|
v5-0001-pg_replslotdata.patch | text/x-patch | 22.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2023-08-27 18:34:07 | Re: list of acknowledgments for PG16 |
Previous Message | Yugo NAGATA | 2023-08-27 17:49:08 | Re: Incremental View Maintenance, take 2 |