Re: Connection slots reserved for replication

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Connection slots reserved for replication
Date: 2018-11-20 16:41:33
Message-ID: CABUevEwzNEmWnFmuWifyBVcKd4itad9wKmrQ_XP6Ttp=ZSK-6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 9, 2018 at 2:02 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Thu, Nov 8, 2018 at 9:30 PM Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >
> > Hello.
> >
> > At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <
> sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=
> EVKGm-56qig(at)mail(dot)gmail(dot)com>
> > > On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
> > > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > > InitializeMaxBackends()
> > > > MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > > > - max_worker_processes;
> > > > + max_worker_processes +
> replication_reserved_connections;
> > > >
> > > > This means walsender doesn't comsume a connection, which is
> > > > different from the current behavior. We should reserve a part of
> > > > MaxConnections for walsenders. (in PostmasterMain,
> > > > max_wal_senders is counted as a part of MaxConnections)
> > >
> > > Yes. We can force replication_reserved_connections <= max_wal_senders
> > > and then reserved connections for replication should be a part of
> > > MaxConnections.
> > >
> > > >
> > > > + if (am_walsender && replication_reserved_connections <
> max_wal_senders
> > > > + && *procgloballist == NULL)
> > > > + procgloballist = &ProcGlobal->freeProcs;
> > > >
> > > > Currently exccesive number of walsenders are rejected in
> > > > InitWalSenderSlot and emit the following error.
> > > >
> > > > > ereport(FATAL,
> > > > > (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > > > > errmsg("number of requested standby connections "
> > > > > "exceeds max_wal_senders (currently %d)",
> > > > > max_wal_senders)));
> > > >
> > > > With this patch, if max_wal_senders =
> > > > replication_reserved_connections = 3 and the fourth walreceiver
> > > > comes, we will get "FATAL: sorry, too many clients already"
> > > > instead. It should be fixed.
> > > >
> > > > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > > > walsenders are active, in an exreme case where a new replication
> > > > connection comes at the same time another is exiting, we could
> > > > end up using two normal slots despite that one slot is vacant in
> > > > reserved slots.
> > >
> > > Doesn't the max_wal_senders prevent the case?
> >
> > Currently the variable doesn't work as so. We once accept the
> > connection request and searches for a vacant slot in
> > InitWalSenderSlot and reject the connection if it found that no
> > room is available. Even with this patch, we don't count the
> > accurate number of active walsenders (for performance reason). If
> > reserved slot are filled, there's no way other than to accept the
> > connection using non-reserved slot if r_r_conn <
> > max_wal_senders. If one of active walsenders went away since we
> > allocated non-reserved connection slot until InitWalSenderSlot
> > starts searching sendnds[] array. Finally the new walsender on
> > the unreserved slot is activated, and one reserved slot is left
> > empty. So this is "an extreme case". We could ignore the case.
> >
> > I'm doubt that we should allow the setting where r_r_conn <
> > max_wal_senders, or even r_r_conn != max_wal_senders. We don't
> > have a problem like this if we don't allow the cases.
> >
> > > Wal senders can get connection if we have free procs more than
> > > (MaxConnections - reserved for superusers). So I think for normal
> > > users the connection must be refused if (MaxConnections - (reserved
> > > for superuser and replication) > # of freeprocs) and for wal senders
> > > the connection also must be refused if (MaxConnections - (reserved for
> > > superuser) > # of freeprocs). I'm not sure we need such trick in
> > > InitWalSenderSlot().
> >
> > (For clarity, I don't mean my previous patch is good solution.)
> >
> > It works as far as we accept that some reserved slots can be left
> > unused despite of some walsenders are using normal slots. (Just
> > exiting a walsender using reserved slot causes this but it is
> > usually occupied by walsenders comes later)
> >
> > Another idea is we acquire a walsnd[] slot before getting a
> > connection slot..
>
> After more thought, I'm inclined to agree to reserve max_wal_senders
> slots and not to have replication_reserved_connections parameter.
>
> For superuser_reserved_connection, actually it works so that we
> certainly reserve slots for superuser in case where slots are almost
> full regardless of who is using other slots incluing superusers
> themselves. But replication connections requires different behaviour
> as it has the another limit (max_wal_senders). If we have
> replication_reserved_connections < max_wal_senders, it could end up
> with the same issue as what originally reported on this thread.
> Therefore many users would set replication_reserved_connections =
> max_wal_senders.
>
> On the other hand, If we always reserve max_wal_senders slots
> available slots for normal backend will get decreased in the next
> release, which require for users to re-confiugre the max_connection.
> But I felt this behavior seems more natural than the current one, so I
> think the re-configuration can be acceptable for users.
>
>
Maybe what we should do instead is not consider max_wal_senders a part of
the total number of connections, and instead size the things that needs to
be sized by them by max_connections + max_wal_senders. That seems more
logical given how the parameters are named as well.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-11-20 16:45:20 Re: CF app feature request
Previous Message Nikolay Shaplov 2018-11-20 16:30:39 Re: Add extension options to control TAP and isolation tests