Impact of checkpointer during pg_upgrade

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: Impact of checkpointer during pg_upgrade
Date: 2023-09-02 04:38:51
Message-ID: CAA4eK1LV3+76CSOAk0h8Kv0AKb-OETsJHe6Sq6172-7DZXf0Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

During pg_upgrade, we start the server for the old cluster which can
allow the checkpointer to remove the WAL files. It has been noticed
that we do generate certain types of WAL records (e.g
XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT)
even during pg_upgrade for old cluster, so additional WAL records
could let checkpointer decide that certain WAL segments can be removed
(e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate
the slots. Currently, I can't see any problem with this but for future
work where we want to migrate logical slots during an upgrade[1], we
need to decide what to do for such cases. The initial idea we had was
that if the old cluster has some invalid slots, we won't allow an
upgrade unless the user removes such slots or uses some option like
--exclude-slots. It is quite possible that slots got invalidated
during pg_upgrade due to no user activity. Now, even though the
possibility of the same is less I think it is worth considering what
should be the behavior.

The other possibilities apart from not allowing an upgrade in such a
case could be (a) Before starting the old cluster, we fetch the slots
directly from the disk using some tool like [2] and make the decisions
based on that state; (b) During the upgrade, we don't allow WAL to be
removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
as well but for that, we need to expose an API to invalidate the
slots; (d) somehow distinguish the slots that are invalidated during
an upgrade and then simply copy such slots because anyway we ensure
that all the WAL required by slot is sent before shutdown.

Thoughts?

[1] - https://www.postgresql.org/message-id/TYAPR01MB58664C81887B3AF2EB6B16E3F5939%40TYAPR01MB5866.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/flat/CALj2ACW0rV5gWK8A3m6_X62qH%2BVfaq5hznC%3Di0R5Wojt5%2Byhyw%40mail.gmail.com

--
With Regards,
Amit Kapila.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2023-09-02 06:52:35 Re: Row pattern recognition
Previous Message jian he 2023-09-02 01:25:37 Re: Extract numeric filed in JSONB more effectively