Re: Connection slots reserved for replication

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: cyberdemn(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Connection slots reserved for replication
Date: 2018-11-08 12:29:54
Message-ID: 20181108.212954.32574929.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=EVKGm-56qig(at)mail(dot)gmail(dot)com>
> On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > InitializeMaxBackends()
> > MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > - max_worker_processes;
> > + max_worker_processes + replication_reserved_connections;
> >
> > This means walsender doesn't comsume a connection, which is
> > different from the current behavior. We should reserve a part of
> > MaxConnections for walsenders. (in PostmasterMain,
> > max_wal_senders is counted as a part of MaxConnections)
>
> Yes. We can force replication_reserved_connections <= max_wal_senders
> and then reserved connections for replication should be a part of
> MaxConnections.
>
> >
> > + if (am_walsender && replication_reserved_connections < max_wal_senders
> > + && *procgloballist == NULL)
> > + procgloballist = &ProcGlobal->freeProcs;
> >
> > Currently exccesive number of walsenders are rejected in
> > InitWalSenderSlot and emit the following error.
> >
> > > ereport(FATAL,
> > > (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > > errmsg("number of requested standby connections "
> > > "exceeds max_wal_senders (currently %d)",
> > > max_wal_senders)));
> >
> > With this patch, if max_wal_senders =
> > replication_reserved_connections = 3 and the fourth walreceiver
> > comes, we will get "FATAL: sorry, too many clients already"
> > instead. It should be fixed.
> >
> > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > walsenders are active, in an exreme case where a new replication
> > connection comes at the same time another is exiting, we could
> > end up using two normal slots despite that one slot is vacant in
> > reserved slots.
>
> Doesn't the max_wal_senders prevent the case?

Currently the variable doesn't work as so. We once accept the
connection request and searches for a vacant slot in
InitWalSenderSlot and reject the connection if it found that no
room is available. Even with this patch, we don't count the
accurate number of active walsenders (for performance reason). If
reserved slot are filled, there's no way other than to accept the
connection using non-reserved slot if r_r_conn <
max_wal_senders. If one of active walsenders went away since we
allocated non-reserved connection slot until InitWalSenderSlot
starts searching sendnds[] array. Finally the new walsender on
the unreserved slot is activated, and one reserved slot is left
empty. So this is "an extreme case". We could ignore the case.

I'm doubt that we should allow the setting where r_r_conn <
max_wal_senders, or even r_r_conn != max_wal_senders. We don't
have a problem like this if we don't allow the cases.

> Wal senders can get connection if we have free procs more than
> (MaxConnections - reserved for superusers). So I think for normal
> users the connection must be refused if (MaxConnections - (reserved
> for superuser and replication) > # of freeprocs) and for wal senders
> the connection also must be refused if (MaxConnections - (reserved for
> superuser) > # of freeprocs). I'm not sure we need such trick in
> InitWalSenderSlot().

(For clarity, I don't mean my previous patch is good solution.)

It works as far as we accept that some reserved slots can be left
unused despite of some walsenders are using normal slots. (Just
exiting a walsender using reserved slot causes this but it is
usually occupied by walsenders comes later)

Another idea is we acquire a walsnd[] slot before getting a
connection slot..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-11-08 12:37:56 Re: BUG #15448: server process (PID 22656) was terminated by exception 0xC0000005
Previous Message Laurenz Albe 2018-11-08 12:23:19 Re: [HACKERS] Surjective functional indexes