Re: Connection slots reserved for replication

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Connection slots reserved for replication
Date: 2018-11-09 00:58:51
Message-ID: CAD21AoCdf8ZfRdOc9Cwfpd4jfiBJfoa_2oweM-kuZOg4ndh0KA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 8, 2018 at 9:30 PM Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Hello.
>
> At Wed, 7 Nov 2018 19:31:00 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoASCq808+iqcFoVuLu-+i8kon=6wN3+sY=EVKGm-56qig(at)mail(dot)gmail(dot)com>
> > On Tue, Nov 6, 2018 at 9:16 PM Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > > InitializeMaxBackends()
> > > MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
> > > - max_worker_processes;
> > > + max_worker_processes + replication_reserved_connections;
> > >
> > > This means walsender doesn't comsume a connection, which is
> > > different from the current behavior. We should reserve a part of
> > > MaxConnections for walsenders. (in PostmasterMain,
> > > max_wal_senders is counted as a part of MaxConnections)
> >
> > Yes. We can force replication_reserved_connections <= max_wal_senders
> > and then reserved connections for replication should be a part of
> > MaxConnections.
> >
> > >
> > > + if (am_walsender && replication_reserved_connections < max_wal_senders
> > > + && *procgloballist == NULL)
> > > + procgloballist = &ProcGlobal->freeProcs;
> > >
> > > Currently exccesive number of walsenders are rejected in
> > > InitWalSenderSlot and emit the following error.
> > >
> > > > ereport(FATAL,
> > > > (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> > > > errmsg("number of requested standby connections "
> > > > "exceeds max_wal_senders (currently %d)",
> > > > max_wal_senders)));
> > >
> > > With this patch, if max_wal_senders =
> > > replication_reserved_connections = 3 and the fourth walreceiver
> > > comes, we will get "FATAL: sorry, too many clients already"
> > > instead. It should be fixed.
> > >
> > > When r_r_conn = 2 and max_wal_senders = 3 and the three
> > > walsenders are active, in an exreme case where a new replication
> > > connection comes at the same time another is exiting, we could
> > > end up using two normal slots despite that one slot is vacant in
> > > reserved slots.
> >
> > Doesn't the max_wal_senders prevent the case?
>
> Currently the variable doesn't work as so. We once accept the
> connection request and searches for a vacant slot in
> InitWalSenderSlot and reject the connection if it found that no
> room is available. Even with this patch, we don't count the
> accurate number of active walsenders (for performance reason). If
> reserved slot are filled, there's no way other than to accept the
> connection using non-reserved slot if r_r_conn <
> max_wal_senders. If one of active walsenders went away since we
> allocated non-reserved connection slot until InitWalSenderSlot
> starts searching sendnds[] array. Finally the new walsender on
> the unreserved slot is activated, and one reserved slot is left
> empty. So this is "an extreme case". We could ignore the case.
>
> I'm doubt that we should allow the setting where r_r_conn <
> max_wal_senders, or even r_r_conn != max_wal_senders. We don't
> have a problem like this if we don't allow the cases.
>
> > Wal senders can get connection if we have free procs more than
> > (MaxConnections - reserved for superusers). So I think for normal
> > users the connection must be refused if (MaxConnections - (reserved
> > for superuser and replication) > # of freeprocs) and for wal senders
> > the connection also must be refused if (MaxConnections - (reserved for
> > superuser) > # of freeprocs). I'm not sure we need such trick in
> > InitWalSenderSlot().
>
> (For clarity, I don't mean my previous patch is good solution.)
>
> It works as far as we accept that some reserved slots can be left
> unused despite of some walsenders are using normal slots. (Just
> exiting a walsender using reserved slot causes this but it is
> usually occupied by walsenders comes later)
>
> Another idea is we acquire a walsnd[] slot before getting a
> connection slot..

After more thought, I'm inclined to agree to reserve max_wal_senders
slots and not to have replication_reserved_connections parameter.

For superuser_reserved_connection, actually it works so that we
certainly reserve slots for superuser in case where slots are almost
full regardless of who is using other slots incluing superusers
themselves. But replication connections requires different behaviour
as it has the another limit (max_wal_senders). If we have
replication_reserved_connections < max_wal_senders, it could end up
with the same issue as what originally reported on this thread.
Therefore many users would set replication_reserved_connections =
max_wal_senders.

On the other hand, If we always reserve max_wal_senders slots
available slots for normal backend will get decreased in the next
release, which require for users to re-confiugre the max_connection.
But I felt this behavior seems more natural than the current one, so I
think the re-configuration can be acceptable for users.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATIONNTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-11-09 01:01:27 ALTER DEFAULT PRIVILEGES is buggy, and so is its testing
Previous Message David G. Johnston 2018-11-08 23:56:26 Re: Connection limit doesn't work for superuser