Re: Reviving lost replication slots

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: sirichamarthi22(at)gmail(dot)com, amit(dot)kapila16(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Reviving lost replication slots
Date: 2022-11-09 09:30:36
Message-ID: CALj2ACVUgCYVgKc42GDCFZo_=xjYyDtHXBhzyhT3s4UqAzLK4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 9, 2022 at 2:02 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> I don't think walsenders fetching segment from archive is totally
> stupid. With that feature, we can use fast and expensive but small
> storage for pg_wal, while avoiding replciation from dying even in
> emergency.

It seems like a useful feature to have at least as an option and it
saves a lot of work - failovers, expensive rebuilds of
standbys/subscribers, manual interventions etc.

If you're saying that even the walsedners serving logical replication
subscribers would go fetch from the archive location for the removed
WAL files, it mandates enabling archiving on the subscribers. And we
know that the archiving is not cheap and has its own advantages and
disadvantages, so the feature may or may not help.
If you're saying that only the walsedners serving streaming
replication standbys would go fetch from the archive location for the
removed WAL files, it's easy to implement, however it is not a
complete feature and doesn't solve the problem for logical
replication.
With the feature, it'll be something like 'you, as primary/publisher,
archive the WAL files and when you don't have them, you'll restore
them', it may not sound elegant, however, it can solve the lost
replication slots problem.
And, the cost of restoring WAL files from the archive might further
slow down the replication thus increasing the replication lag.
And, one need to think, how many such WAL files are restored and kept,
whether they'll be kept in pg_wal or some other directory, how will
the disk full, fetching too old or too many WAL files for replication
slots lagging behind, removal of unnecessary WAL files etc. be
handled.

I'm not sure about other implications at this point of time.

Perhaps, implementing this feature as a core/external extension by
introducing segment_open() or other necessary hooks might be worth it.

If implemented in some way, I think the scope of replication slot
invalidation/max_slot_wal_keep_size feature gets reduced or it can be
removed completely, no?

> However, supposing that WalSndSegmentOpen() fetches segments from
> archive as the fallback and that succeeds, the slot can survive
> missing WAL in pg_wal in the first place. So this patch doesn't seem
> to be needed for the purpose.

That is a simple solution one can think of and provide for streaming
replication standbys, however, is it worth implementing it in the core
as explained above?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2022-11-09 10:01:14 Re: BUG #17434: CREATE/DROP DATABASE can be executed in the same transaction with other commands
Previous Message Bharath Rupireddy 2022-11-09 09:17:11 Re: thinko in basic_archive.c