| From: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
|---|---|
| To: | Jelte Fennema-Nio <me(at)jeltef(dot)nl> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dave Cramer <davecramer(at)gmail(dot)com>, Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Subject: | Re: Add GoAway protocol message for graceful but fast server shutdown/switchover |
| Date: | 2025-10-24 05:04:50 |
| Message-ID: | CALdSSPiORRiJ892J07tT3_xcAm=O9JCW-qHUQvFc6axHoDW_Og@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, 23 Oct 2025 at 18:05, Jelte Fennema-Nio <me(at)jeltef(dot)nl> wrote:
>
> This change introduces a new GoAway backend-to-frontend protocol
> message (byte 'g') that the server can send to the client to politely
> request that client to disconnect/reconnect when convenient. This message is
> advisory only - the connection remains fully functional and clients may
> continue executing queries and starting new transactions. "When
> convenient" is obviously not very well defined, but the primary target
> clients are clients that maintain a connection pool. Such clients should
> disconnect/reconnect a connection in the pool when there's no user of
> that connection. This is similar to how such clients often currently
> remove a connection from the pool after the connection hits a maximum
> lifetime of e.g. 1 hour.
>
> This new message is used by Postgres during the already existing "smart"
> shutdown procedure (i.e. when postmaster receives SIGTERM). When
> Postgres is in "smart" shutdown mode existing clients can continue to
> run queries as usual but new connection attempts are rejected. This mode
> is primarily useful when triggering a switchover of a read replica. A
> load balancer can route new connections only to the new read replica,
> while the old load balancer keeps serving the existing connections until
> they disconnect. The problem is that this draining of connections could
> often take a long time. Even when clients only run very short
> queries/transactions because the session can be kept open much longer
> (many connection pools use 1 hour max lifetime of a connection by default).
> With the introduction of the GoAway message Postgres now sends this
> message to all connected clients when it enters smart shutdown mode.
> If these clients respond to the message by reconnecting/disconnecting
> earlier than their maximum connection lifetime the draining can complete
> much quicker. Similar benefits to switchover duration can be achieved
> for other applications or proxies implementing the Postgres protocol,
> like when switching over a cluster of PgBouncer machines to a newer
> version.
>
> Applications/clients that use libpq can periodically check the result of
> the new PQgoAwayReceived() function to find out whether they have been
> asked to reconnect.
Hi!
Im +1 on this idea. This is something I wanted back in 2020, when
implementing the 'online restart' feature for odyssey[0], but never
bothered to create a thread.
Due to its asyn engine complexity, odyssey cannot simply reuse tcp
connections from 'old' binary, so we accept new connections in new
binary and try to drop connections in old binary with some rate.
About patches:
in 0001:
>+
>+ if (strcmp(value, "latest") == 0)
>+ {
>+ *result = PG_PROTOCOL_LATEST;
>+ return true;
>+ }
Not needed? we already have this check at the beginning of
pqParseProtocolVersion
In 0002:
> + The <literal>GoAway</literal> message is sent by the server during a
> + smart shutdown to politely request that clients disconnect.
I'm not sure this wording is super-foolproof. First of all, is it
'client', not 'clients'? Looks like we should describe single client
to single server interaction in this doc.
Maybe also change the last sentence to ' ... to instruct clients to
disconnect.' ? Maybe this wording is not great also, but I want to
reflect in doc that disconnection is
strongly advised, yet not obligatory
> + Applications should check this flag
> + periodically and disconnect gracefully when possible, such as after
> + completing the current transaction or unit of work.
What flag? Also, 'Applications should' - no, they shouldn't, is it
just an option? Maybe we should change wording to something like
'Applications can decide that it is recommendatory to close (or maybe
re-open) their connection with the server as soon as they get at least
one 'GoAway' msg.'
Also, can the server send more than one 'GoAway' msg? If yes, should
we document this?
> - * notice. (An ERROR is very possibly the backend telling us why
> + * notice. (An ERROR is very possibly the backend telling us why
This change is unrelated
Other coding changes looks straightforward and are fine to me.
[0] https://github.com/yandex/odyssey
--
Best regards,
Kirill Reshke
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2025-10-24 05:18:48 | Re: [PATCH TEST] Fix logical replication setup in subscription test `t/009_matviews.pl` |
| Previous Message | Amit Kapila | 2025-10-24 05:01:13 | Re: issue with synchronized_standby_slots |