Re: replication_slots usability issue

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: replication_slots usability issue
Date: 2018-10-31 02:10:23
Message-ID: 20181031021023.GA7862@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 30, 2018 at 10:52:54AM -0700, Andres Freund wrote:
> On 2018-10-30 11:51:09 +0900, Michael Paquier wrote:
>> Er... At the same time, shouldn't RestoreSlotFromDisk() *not* use PANIC
>> if more slots are found in pg_replslot than max_replication_slots can
>> handle. A FATAL is fine at startup, PANIC blows up a core file, which
>> is clearly overdoing it if the goal is to give a recommendation at the
>> end.
>
> I can't get particularly excited about this.

This gets only used by the startup process to restore slot information,
so for example if you have an instance with two slots, shut it down,
reduce max_replication_slots to 1, and restart then you generate a core
dump, which is a bit overdoing it to simply let the user know that he
just needs to bump up one parameter and to give him a recommendation to
do so :)

For example:
2018-10-31 11:04:08 JST [9014]: [11-1] db=,user=,app=,client= DEBUG:
starting up replication slots
2018-10-31 11:04:08 JST [9014]: [12-1] db=,user=,app=,client= DEBUG:
restoring replication slot from "pg_replslot/popo3/state"
2018-10-31 11:04:08 JST [9014]: [13-1] db=,user=,app=,client= DEBUG:
restoring replication slot from "pg_replslot/popo2/state"
2018-10-31 11:04:08 JST [9014]: [14-1] db=,user=,app=,client= PANIC:
too many replication slots active before shutdown
2018-10-31 11:04:08 JST [9014]: [15-1] db=,user=,app=,client= HINT:
Increase max_replication_slots and try again.

> I guess we can change it, but I'd only do so in master.

I won't fight hard for a back-patch, now I think that we should
back-patch a fix. Generating a core file if a state is unexpected is
fine because you expect the system to react so, being able to trigger
one based on a parameter switch just to give a recommendation is not in
my opinion.

For example with the attached the system reacts as follows:
2018-10-31 11:07:23 JST [10063]: [14-1] db=,user=,app=,client= FATAL:
too many replication slots active before shutdown
2018-10-31 11:07:23 JST [10063]: [15-1] db=,user=,app=,client= HINT:
Increase max_replication_slots and try again.
2018-10-31 11:07:23 JST [10061]: [6-1] db=,user=,app=,client= LOG:
startup process (PID 10063) exited with exit code 1
2018-10-31 11:07:23 JST [10061]: [7-1] db=,user=,app=,client= LOG:
aborting startup due to startup process failure
--
Michael

Attachment Content-Type Size
slot-restore-fatal.patch text/x-diff 442 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-10-31 02:26:59 Re: ToDo: show size of partitioned table
Previous Message Amit Langote 2018-10-31 01:50:19 Re: Should pg 11 use a lot more memory building an spgist index?