Re: more descriptive message for process termination due to max_slot_wal_keep_size

From: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: <sawada(dot)mshk(at)gmail(dot)com>, <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: more descriptive message for process termination due to max_slot_wal_keep_size
Date: 2022-09-06 08:54:35
Message-ID: 03ee95c3-6496-90b7-5c8b-13fe69ee73dd@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 9/6/22 7:53 AM, Kyotaro Horiguchi wrote:
> At Mon, 5 Sep 2022 11:56:33 +0200, "Drouvot, Bertrand"<bdrouvot(at)amazon(dot)com> wrote in
>> Hi,
>>
>> On 3/2/22 7:37 AM, Kyotaro Horiguchi wrote:
>>> At Tue, 04 Jan 2022 10:29:31 +0900 (JST), Kyotaro
>>> Horiguchi<horikyota(dot)ntt(at)gmail(dot)com> wrote in
>>>> So what do you say if I propose the following?
>>>>
>>>> LOG: terminating process %d to release replication slot \"%s\"
>>>> because its restart_lsn %X/%X exceeds the limit %X/%X
>>>> HINT: You might need to increase max_slot_wal_keep_size.
>>> This version emits the following message.
>>>
>>> [35785:checkpointer] LOG: terminating process 36368 to release
>>> replication slot "s1" because its restart_lsn 0/1F000148 exceeds the
>>> limit 0/21000000
>>> [35785:checkpointer] HINT: You might need to increase
>>> max_slot_wal_keep_size.
>> As the hint is to increase max_slot_wal_keep_size, what about
>> reporting the difference in size (rather than the limit lsn)?
>> Something along those lines?
>>
>> [35785:checkpointer] LOG: terminating process 36368 to release
>> replication slot "s1" because its restart_lsn 0/1F000148 exceeds the
>> limit by <NNN MB>.
> Thanks! That might be more sensible exactly for the reason you
> mentioned. One issue doing that is size_pretty is dbsize.c local
> function. Since the size is less than kB in many cases, we cannot use
> fixed unit for that.

Thanks for the new patch version!. I did not realized (sorry about that)
that we'd need to expose byte_size_pretty(). Now I wonder if we should
not simply report the number of bytes (like I can see it is done in many
places). So something like:

@@ -1298,9 +1298,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s,
XLogRecPtr oldestLSN,
                                byte_size_pretty(buf, sizeof(buf),
oldestLSN - restart_lsn);
                                ereport(LOG,
- (errmsg("terminating process %d to release replication slot \"%s\"
because its restart_lsn %X/%X exceeds the limit by %s",
+ (errmsg("terminating process %d to release replication slot \"%s\"
because its restart_lsn %X/%X exceeds the limit by %lu bytes",
active_pid, NameStr(slotname),
- LSN_FORMAT_ARGS(restart_lsn), buf),
+ LSN_FORMAT_ARGS(restart_lsn), oldestLSN - restart_lsn),
                                                 errhint("You might
need to increase max_slot_wal_keep_size.")));

and then forget about exposing/using byte_size_pretty() (that would be
more consistent with the same kind of reporting in the existing code).

What do you think?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services:https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message kuroda.hayato@fujitsu.com 2022-09-06 09:16:56 RE: test_decoding assertion failure for the loss of top-sub transaction relationship
Previous Message Amit Kapila 2022-09-06 08:45:22 Re: Column Filtering in Logical Replication