Re: Add GoAway protocol message for graceful but fast server shutdown/switchover

From: "Jelte Fennema-Nio" <postgres(at)jeltef(dot)nl>
To: "Tomas Vondra" <tomas(at)vondra(dot)me>
Cc: "Zsolt Parragi" <zsolt(dot)parragi(at)percona(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Dave Cramer" <davecramer(at)gmail(dot)com>, "Jacob Champion" <jacob(dot)champion(at)enterprisedb(dot)com>, "Heikki Linnakangas" <hlinnaka(at)iki(dot)fi>, <jnasby(at)upgrade(dot)com>
Subject: Re: Add GoAway protocol message for graceful but fast server shutdown/switchover
Date: 2026-03-24 14:27:31
Message-ID: DHB2ZT3ZN8L5.21CRG9GA9317G@jeltef.nl
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Fri, 20 Mar 2026 at 20:20, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> It'd be very helpful if there was some sort of PoC
> support on the pooler/client side, so that I can experiment with it and
> see how helpful the new protocol message is. But I realize that's a bit
> too much to ask for.

I'll see if I can whip something up, it shouldn't be too hard.

> Why not to have a pg_goaway_backend() function, that'd send the
> message to a single backend?

I like this idea a lot. So I added it in the attached v8 patch. This
also allowed we me to add low level tests using the libpq_pipeline
testsuite.

> * In fact, does it improve the smart shutdown case in practice? Let's
> say we have a single instance, and we're restarting it. It'll send
> GoAway to all the clients, the good clients will try to reconnect. But
> if there's even a single "bad" client ignoring the GoAway, all the
> well-behaved clients will get stuck. Ofc, that can happen without the
> GoAway message too - a client may disconnect because of timeout etc. But
> it makes it more likely, and it'll affect the well-behaved clients.

For primary server restarts, I don't think anyone should be using smart
shutdown right now either. Any new connections to the database will be
failing for an indeterminate amount of time. I agree that sending GoAway
might worsen the problem in some cases, but it's already terrible to
start with. Fast shutdown is the only sensible restart mode for a
primary server. This seems to be generally accepted knowledge, given
that we use SIGINT (fast shutdown) in our systemd example[1].

Sending a GoAway on smart shutdown makes that shutdown mode very useful
for read replicas during a planned switch-over to another replica. Now
clients can finish their work and quickly reconnect to the new read
replica, minimizing switchover time while preventing errors.

Even when restarting primary servers, triggering a smart shutdown has a
benefits, as long as it's followed by a fast shutdown after a short
delay (e.g., 1 second). This causes slightly longer downtime (the
additional delay), but it allows most clients to disconnect on their own
terms instead of in the middle of a query. Connection errors can often
be retried transparently more easily than errors in the middle of a
query. In effect, for many applications, this could mean a reduction in
errors and only an increase in latency during a restart.

> * Would it make sense to have some payload in the GoAway message? I'm
> thinking about (a) some deadline by which the client should disconnect,
> e.g. time of planned restart / shutdown, (b) priority, expressing how
> much the client should try to disconnect (and maybe take more drastic
> actions).

I thought some more about this, but ultimately, the payloads you suggest
only seem useful if a client has something inbetween "disconnect hard
now" and "disconnect when the connection is unused". I cannot think of
any such cases. i.e. what other "drastic actions" could a client take
instead of simply closing the connection. If that's the only
possibility, why not simply have the server close the connection in that
case.

Overall, I agree that having no payload in this new message feels a bit
weird. But ultimately, clients don't need any payload to do something
useful.

> Also, two minor comments:

Fixed.

Attachment Content-Type Size
v8-0001-Add-GoAway-protocol-message-for-graceful-but-fast.patch text/x-patch 38.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Evgeny Voropaev 2026-03-24 14:28:24 Re: Compress prune/freeze records with Delta Frame of Reference algorithm
Previous Message Dilip Kumar 2026-03-24 14:20:18 Re: Skipping schema changes in publication