Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
Date: 2021-12-28 06:58:07
Message-ID: CALj2ACWJj-rRV0BwMb-O+G9zonzX3n=-6=JxGPYndMDNSrmgcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 15, 2021 at 8:32 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > Here's the patch that adds a LOG message whenever a replication slot
> > becomes active and inactive. These logs will be extremely useful on
> > production servers to debug and analyze inactive replication slot
> > issues.
> >
> > Thoughts?
>
> If I create a replication slot, I saw the following lines in server log.
>
> [22054:client backend] LOG: replication slot "s1" becomes active
> [22054:client backend] DETAIL: The process with PID 22054 acquired it.
> [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
> [22054:client backend] LOG: replication slot "s1" becomes inactive
> [22054:client backend] DETAIL: The process with PID 22054 released it.
> [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
>
> They are apparently too much as if they were DEBUG3 lines. The
> process PID shown is of the process the slot operations took place so
> I think it conveys no information. The STATEMENT lines are also noisy
> for non-ERROR emssages. Couldn't we hide that line?
>
> That is, how about making the log lines as simple as the follows?
>
> [17156:walsender] LOG: acquired replication slot "s1"
> [17156:walsender] LOG: released replication slot "s1"

Thanks for taking a look at the patch. Here's what I've come up with:

for drops:
2021-12-28 06:39:34.963 UTC [2541600] LOG: acquired persistent
physical replication slot "myslot1"
2021-12-28 06:39:34.980 UTC [2541600] LOG: dropped persistent
physical replication slot "myslot1"
2021-12-28 06:47:39.994 UTC [2544153] LOG: acquired persistent
logical replication slot "myslot2"
2021-12-28 06:47:40.003 UTC [2544153] LOG: dropped persistent logical
replication slot "myslot2"

for creates:
2021-12-28 06:39:46.859 UTC [2541600] LOG: created persistent
physical replication slot "myslot1"
2021-12-28 06:39:46.859 UTC [2541600] LOG: released persistent
physical replication slot "myslot1"
2021-12-28 06:45:20.037 UTC [2544153] LOG: created persistent logical
replication slot "myslot2"
2021-12-28 06:45:20.058 UTC [2544153] LOG: released persistent
logical replication slot "myslot2"

for pg_basebackup:
2021-12-28 06:41:04.601 UTC [2542686] LOG: created temporary physical
replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.602 UTC [2542686] LOG: released temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.602 UTC [2542686] LOG: acquired temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.867 UTC [2542686] LOG: released temporary
physical replication slot "pg_basebackup_2542686"
2021-12-28 06:41:04.954 UTC [2542686] LOG: dropped temporary physical
replication slot "pg_basebackup_2542686"

The amount of logs may seem noisy, but they do help a lot given the
fact that the server generates much more noise, for instance, the
server logs the syntax error statements. And, the replication slots
don't get created/dropped every now and then (at max, the
pg_basebackup if at all used and configured to take the backups, say,
every 24hrs or so). With the above set of logs debugging for inactive
replication slots becomes easier. One can find the root cause and
answer questions like "why there was a huge WAL file growth at some
point or when did a replication slot become inactive or how much time
a replication slot was inactive or when did a standby disconnected and
connected again or when did a pg_receivewal or pg_recvlogical
connected and disconnected so on.".

Here's the v2 patch. Please provide your thoughts.

Regards,
Bharath Rupireddy.

Attachment Content-Type Size
v2-0001-Add-LOG-messages-when-replication-slots-become-ac.patch application/octet-stream 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Ladhe 2021-12-28 07:41:53 Re: refactoring basebackup.c
Previous Message Fujii Masao 2021-12-28 06:56:22 Re: sequences vs. synchronous replication