Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: alvherre(at)alvh(dot)no-ip(dot)org, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
Date: 2022-01-05 06:43:41
Message-ID: 20220105.154341.410117452632618588.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 28 Dec 2021 12:28:07 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> On Wed, Dec 15, 2021 at 8:32 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > Here's the patch that adds a LOG message whenever a replication slot
> > > becomes active and inactive. These logs will be extremely useful on
> > > production servers to debug and analyze inactive replication slot
> > > issues.
> > >
> > > Thoughts?
> >
> > If I create a replication slot, I saw the following lines in server log.
> >
> > [22054:client backend] LOG: replication slot "s1" becomes active
> > [22054:client backend] DETAIL: The process with PID 22054 acquired it.
> > [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
> > [22054:client backend] LOG: replication slot "s1" becomes inactive
> > [22054:client backend] DETAIL: The process with PID 22054 released it.
> > [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1');
> >
> > They are apparently too much as if they were DEBUG3 lines. The
> > process PID shown is of the process the slot operations took place so
> > I think it conveys no information. The STATEMENT lines are also noisy
> > for non-ERROR emssages. Couldn't we hide that line?
> >
> > That is, how about making the log lines as simple as the follows?
> >
> > [17156:walsender] LOG: acquired replication slot "s1"
> > [17156:walsender] LOG: released replication slot "s1"
>
> Thanks for taking a look at the patch. Here's what I've come up with:
>
> for drops:
(two log lines per slot: acquire->drop)
>
> for creates:
(two log lines per slot: create->release)
..

Theses are still needlessly verbose. Even for those who want slot
activities to be logged are not interested in this detail. This is
still debug logs in that sense.

> The amount of logs may seem noisy, but they do help a lot given the
> fact that the server generates much more noise, for instance, the
> server logs the syntax error statements. And, the replication slots
> don't get created/dropped every now and then (at max, the
> pg_basebackup if at all used and configured to take the backups, say,
> every 24hrs or so). With the above set of logs debugging for inactive

In a nearby thread, there was a discussion that checkpoint logs are
too noisy to defaultly turn on. Finally it is committed but I don't
think this is committed as is as it is more verbose (IMV) than the
checkpoint logs. Thus this logs need to be muteable. Meanwhile I
don't think we willingly add a new knob for this feature. I think we
can piggy-back on log_replication_commands for the purpose, changing
its meaning slightly to "log replication commands and related
activity".

> replication slots becomes easier. One can find the root cause and
> answer questions like "why there was a huge WAL file growth at some
> point or when did a replication slot become inactive or how much time
> a replication slot was inactive or when did a standby disconnected and
> connected again or when did a pg_receivewal or pg_recvlogical
> connected and disconnected so on.".

I don't deny it is useful in that cases. If you are fine that the
logs are debug-only, making them DEBUG1 would work. But I don't think
you are fine with that since I think you are going to turn on them on
production systems.

> Here's the v2 patch. Please provide your thoughts.

Thanks! I have three comments on this version.

- I still think "acquire/release" logs on creation/dropping should be
silenced. Adding the third parameter to ReplicationSlotAcquire()
that can mute the acquiring (and as well as the corresponding
releasing) log will work.

- Need to mute the logs by log_replication_commands. (We could add
another choice for the variable for this purpose but I think we
don't need it.)

- The messages are not translatable as the variable parts are
adjectives. They need to consist of static sentences. The
combinations of the two properties are 6 (note that persistence is
tristate) but I don't think we accept that complexity for the
information. So I recommend to just remove the variable parts from
the messages.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-01-05 06:44:54 Re: row filtering for logical replication
Previous Message Shinya Kato 2022-01-05 06:21:47 Re: [Proposal] Add foreign-server health checks infrastructure