Re: Introduce XID age and inactive timeout based replication slot invalidation

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-13 09:15:14
Message-ID: CAJpy0uA_NYdQdYuT+aWQMjefeQ=LTj-i3HBf1NMjiR9b+x6VPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 8, 2024 at 10:42 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > You might want to consider its interaction with sync slots on standby.
> > Say, there is no activity on slots in terms of processing the changes
> > for slots. Now, we won't perform sync of such slots on standby showing
> > them inactive as per your new criteria where as same slots could still
> > be valid on primary as the walsender is still active. This may be more
> > of a theoretical point as in running system there will probably be
> > some activity but I think this needs some thougths.
>
> I believe the xmin and catalog_xmin of the sync slots on the standby
> keep advancing depending on the slots on the primary, no? If yes, the
> XID age based invalidation shouldn't be a problem.

If the user has not enabled slot-sync worker and is relying on the SQL
function pg_sync_replication_slots(), then the xmin and catalog_xmin
of synced slots may not keep on advancing. These will be advanced only
on next run of function. But meanwhile the synced slots may be
invalidated due to 'xid_aged'. Then the next time, when user runs
pg_sync_replication_slots() again, the invalidated slots will be
dropped and will be recreated by this SQL function (provided they are
valid on primary and are invalidated on standby alone). I am not
stating that it is a problem, but we need to think if this is what we
want. Secondly, the behaviour is not same with 'inactive_timeout'
invalidation. Synced slots are immune to 'inactive_timeout'
invalidation as this invalidation happens only in walsender, while
these are not immune to 'xid_aged' invalidation. So again, needs some
thoughts here.

> I believe there are no walsenders started for the sync slots on the
> standbys, right? If yes, the inactive timeout based invalidation also
> shouldn't be a problem. Because, the inactive timeouts for a slot are
> tracked only for walsenders because they are the ones that typically
> hold replication slots for longer durations and for real replication
> use. We did a similar thing in a recent commit [1].
>
> Is my understanding right? Do you still see any problems with it?

I have explained the situation above for us to think over it better.

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-03-13 10:04:43 Re: [EXTERNAL] Re: Add non-blocking version of PQcancel
Previous Message Dean Rasheed 2024-03-13 08:58:13 Re: MERGE ... RETURNING