Reviving lost replication slots

From: sirisha chamarthi <sirichamarthi22(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Reviving lost replication slots
Date: 2022-11-04 08:10:39
Message-ID: CAKrAKeW-sGqvkw-2zKuVYiVv=EOG4LEqJn01RJPsHfS2rQGYng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

A replication slot can be lost when a subscriber is not able to catch up
with the load on the primary and the WAL to catch up exceeds
max_slot_wal_keep_size. When this happens, target has to be reseeded
(pg_dump) from the scratch and this can take longer. I am investigating the
options to revive a lost slot. With the attached patch and copying the WAL
files from the archive to pg_wal directory I was able to revive the lost
slot. I also verified that a lost slot doesn't let vacuum cleanup the
catalog tuples deleted by any later transaction than catalog_xmin. One side
effect of this approach is that the checkpointer creating the .ready files
corresponds to the copied wal files in the archive_status folder. Archive
command has to handle this case. At the same time, checkpointer can
potentially delete the file again before the subscriber consumes the file
again. In the proposed patch, I am not setting restart_lsn
to InvalidXLogRecPtr but instead relying on invalidated_at field to tell if
the slot is lost. Is the intent of setting restart_lsn to InvalidXLogRecPtr
was to disallow reviving the slot?

If overall direction seems ok, I would continue on the work to revive the
slot by copying the wal files from the archive. Appreciate your feedback.

Thanks,
Sirisha

Attachment Content-Type Size
0001-Allow-revive-a-lost-replication-slot.patch application/octet-stream 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2022-11-04 08:23:54 Re: Add explicit casts in four places to simplehash.h
Previous Message Amit Kapila 2022-11-04 08:06:42 Re: Perform streaming logical transactions by background workers and parallel apply