persist logical slots to disk during shutdown checkpoint

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: persist logical slots to disk during shutdown checkpoint
Date: 2023-08-19 06:16:38
Message-ID: CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

It's entirely possible for a logical slot to have a confirmed_flush
LSN higher than the last value saved on disk while not being marked as
dirty. It's currently not a problem to lose that value during a clean
shutdown / restart cycle but to support the upgrade of logical slots
[1] (see latest patch at [2]), we seem to rely on that value being
properly persisted to disk. During the upgrade, we need to verify that
all the data prior to shudown_checkpoint for the logical slots has
been consumed, otherwise, the downstream may miss some data. Now, to
ensure the same, we are planning to compare the confirm_flush LSN
location with the latest shudown_checkpoint location which means that
the confirm_flush LSN should be updated after restart.

I think this is inefficient even without an upgrade because, after the
restart, this may lead to decoding some data again. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN,
it may lead to processing the same changes again.

The idea discussed in the thread [1] is to always persist logical
slots to disk during the shutdown checkpoint. I have extracted the
patch to achieve the same from that thread and attached it here. This
could lead to some overhead during shutdown (checkpoint) if there are
many slots but it is probably a one-time work.

I couldn't think of better ideas but another possibility is to mark
the slot as dirty when we update the confirm_flush LSN (see
LogicalConfirmReceivedLocation()). However, that would be a bigger
overhead in the running server as it could be a frequent operation and
could lead to more writes.

Thoughts?

[1] - https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/TYAPR01MB5866562EF047F2C9DDD1F9DEF51BA%40TYAPR01MB5866.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
v1-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch application/octet-stream 4.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2023-08-19 07:16:15 Re: persist logical slots to disk during shutdown checkpoint
Previous Message Pavel Stehule 2023-08-19 05:18:10 Re: [17] Special search_path names "!pg_temp" and "!pg_catalog"