Re: when the startup process doesn't

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: when the startup process doesn't
Date: 2021-04-21 18:36:24
Message-ID: 20210421183624.GP20766@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2021-04-20 14:56:58 -0400, Tom Lane wrote:
> > I wonder though whether we really need authentication here. pg_ping
> > already exposes whether the database is up, to anyone who can reach the
> > postmaster port at all. Would it be so horrible if the "can't accept
> > connections" error message included a detail about "recovery is X%
> > done"?
>
> Unfortunately I think something like a percentage is hard to calculate
> right now. Even just looking at crash recovery (vs replication or
> PITR), we don't currently know where the WAL ends without reading all
> the WAL. The easiest thing to return would be something in LSNs or
> bytes and I suspect that we don't want to expose either unauthenticated?

While it obviously wouldn't be exactly accurate, I wonder if we couldn't
just look at the WAL files we have to reply and then guess that we'll go
through about half of them before we reach the end..? I mean, wouldn't
exactly be the first time that a percentage progress report wasn't
completely accurate. :)

> I wonder if we ought to occasionally update something like
> ControlFileData->minRecoveryPoint on primaries, similar to what we do on
> standbys? Then we could actually calculate a percentage, and it'd have
> the added advantage of allowing to detect more cases where the end of
> the WAL was lost. Obviously we'd have to throttle it somehow, to avoid
> adding a lot of fsyncs, but that seems doable?

This seems to go against Tom's concerns wrt rewriting pg_control.
Perhaps we could work through a solution to that, which would be nice,
but I'm not sure that we need the percentage to be super accurate
anyway, though, ideally, we'd work it out so that it's always increasing
and doesn't look "stuck" as long as we're actually moving forward
through the WAL.

Maybe a heuristic of 'look at the end of the WAL files, assume we'll go
through 50% of it, but only consider that to be 90%, with the last 10%
going from half-way through the WAL to the actual end of the WAL
available."

Yes, such heuristics are terrible, but they're also relatively simple
and wouldn't require tracking anything additional and would, maybe,
avoid the concern about needing to authenticate the user..

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-04-21 18:43:30 Re: when the startup process doesn't
Previous Message Stephen Frost 2021-04-21 17:40:35 Re: PATCH: Add GSSAPI ccache_name option to libpq