Re: Introduce XID age and inactive timeout based replication slot invalidation

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Date: 2024-03-13 07:15:06
Message-ID: CAJpy0uD64X=2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> JFYI, the patch does not apply to the head. There is a conflict in
> multiple files.

For review purposes, I applied v8 to the March 6 code-base. I have yet
to review in detail, please find my initial thoughts:

1)
I found that 'inactive_replication_slot_timeout' works only if there
was any walsender ever started for that slot . The logic is under
'am_walsender' check. Is this intentional?
If I create a slot and use only pg_logical_slot_get_changes or
pg_replication_slot_advance on it, it never gets invalidated due to
timeout. While, when I set 'max_slot_xid_age' or say
'max_slot_wal_keep_size' to a lower value, the said slot is
invalidated correctly with 'xid_aged' and 'wal_removed' reasons
respectively.

Example:
With inactive_replication_slot_timeout=1min, test1_3 is the slot for
which there is no walsender and only advance and get_changes SQL
functions were called; test1_4 is the one for which pg_recvlogical was
run for a second.

test1_3 | 785 | | reserved | | t
| |
test1_4 | 798 | | lost | inactive_timeout | t |
2024-03-13 11:52:41.58446+05:30 |

And when inactive_replication_slot_timeout=0 and max_slot_xid_age=10

test1_3 | 785 | | lost | xid_aged | t
| |
test1_4 | 798 | | lost | inactive_timeout | t |
2024-03-13 11:52:41.58446+05:30 |

2)
The msg for patch 3 says:
--------------
a) when replication slots is lying inactive for a day or so using
last_inactive_at metric,
b) when a replication slot is becoming inactive too frequently using
last_inactive_at metric.
--------------
I think in b, you want to refer to inactive_count instead of last_inactive_at?

3)
I do not see invalidation_reason updated for 2 new reasons in system-views.sgml

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Xing Guo 2024-03-13 07:18:33 Re: Disable LLVM bitcode generation with pgxs.mk framework.
Previous Message Sutou Kouhei 2024-03-13 07:12:00 Re: meson: Specify -Wformat as a common warning flag for extensions