Re: Introduce XID age and inactive timeout based replication slot invalidation

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-13 07:21:18
Message-ID: ZfFT7tgWpqx7oZko@ip-10-97-1-34.eu-west-3.compute.internal
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Mar 12, 2024 at 09:19:35PM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 12, 2024 at 9:11 PM Bertrand Drouvot
> <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> >
> > > AFAIR, we don't prevent similar invalidations due to
> > > 'max_slot_wal_keep_size' for sync slots,
> >
> > Right, we'd invalidate them on the standby should the standby sync slot restart_lsn
> > exceeds the limit.
>
> Right. Help me understand this a bit - is the wal_removed invalidation
> going to conflict with recovery on the standby?

I don't think so, as it's not directly related to recovery. The slot will
be invalided on the standby though.

> Per the discussion upthread, I'm trying to understand what
> invalidation reasons will exactly cause conflict with recovery? Is it
> just rows_removed and wal_level_insufficient invalidations?

Yes, that's the ones added in be87200efd.

See the error messages on a standby:

== wal removal

postgres=# SELECT * FROM pg_logical_slot_get_changes('lsub4_slot', NULL, NULL, 'include-xids', '0');
ERROR: can no longer get changes from replication slot "lsub4_slot"
DETAIL: This slot has been invalidated because it exceeded the maximum reserved size.

== wal level

postgres=# select conflict_reason from pg_replication_slots where slot_name = 'lsub5_slot';;
conflict_reason
------------------------
wal_level_insufficient
(1 row)

postgres=# SELECT * FROM pg_logical_slot_get_changes('lsub5_slot', NULL, NULL, 'include-xids', '0');
ERROR: can no longer get changes from replication slot "lsub5_slot"
DETAIL: This slot has been invalidated because it was conflicting with recovery.

== rows removal

postgres=# select conflict_reason from pg_replication_slots where slot_name = 'lsub6_slot';;
conflict_reason
-----------------
rows_removed
(1 row)

postgres=# SELECT * FROM pg_logical_slot_get_changes('lsub6_slot', NULL, NULL, 'include-xids', '0');
ERROR: can no longer get changes from replication slot "lsub6_slot"
DETAIL: This slot has been invalidated because it was conflicting with recovery.

As you can see, only wal level and rows removal are mentioning conflict with
recovery.

So, are we already "wrong" mentioning "wal_removed" in conflict_reason?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-03-13 07:30:27 Re: Refactoring backend fork+exec code
Previous Message Xing Guo 2024-03-13 07:18:33 Re: Disable LLVM bitcode generation with pgxs.mk framework.