Re: Offline enabling/disabling of data checksums

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Magnus Hagander <magnus(at)hagander(dot)net>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Sergei Kornilov <sk(at)zsrv(dot)org>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Offline enabling/disabling of data checksums
Date: 2019-03-19 16:47:17
Message-ID: 20190319164717.ex2pw66zusbdqg2o@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-03-19 17:30:16 +0100, Michael Banck wrote:
> Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund:
> > On 2019-03-19 17:08:17 +0100, Michael Banck wrote:
> > > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund:
> > > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote:
> > > > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund:
> > > > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote:
> > > > > > > +/*
> > > > > > > + * Locations of persistent and temporary control files. The control
> > > > > > > + * file gets renamed into a temporary location when enabling checksums
> > > > > > > + * to prevent a parallel startup of Postgres.
> > > > > > > + */
> > > > > > > +#define CONTROL_FILE_PATH "global/pg_control"
> > > > > > > +#define CONTROL_FILE_PATH_TEMP CONTROL_FILE_PATH ".pg_checksums_in_progress"
> > > > > >
> > > > > > I think this should be outright rejected. Again, you're making the
> > > > > > control file into something it isn't. And there's no buyin for this as
> > > > > > far as I can tell outside of Fabien and you. For crying out loud, if the
> > > > > > server crashes during this YOU'VE CORRUPTED THE CLUSTER.
> > > > >
> > > > > The cluster is supposed to be offline during this. This is just an
> > > > > additional precaution so that nobody starts it during the operation -
> > > > > similar to how pg_upgrade disables the old data directory.
> > > >
> > > > I don't see how that matters. Afterwards the cluster needs low level
> > > > surgery to be recovered. That's a) undocumented b) likely to be done
> > > > wrongly. This is completely unacceptable *AND UNNECESSARY*.
> > >
> > > Can you explain why low level surgery is needed and how that would look
> > > like?
> > >
> > > If pg_checksums successfully enables checksums, it will move back the
> > > control file and update the checksum version - the cluster is ready to
> > > be started again unless I am missing something?
> > >
> > > If pg_checksums is interrupted by the admin, it will move back the
> > > control file and the cluster is ready to be started again as well.
> > >
> > > If pg_checksums aborts with a failure, the admin will have to move back
> > > the control file before starting up the instance again, but I don't
> > > think that counts?
> >
> > That absolutely counts. Even a short period would imo be unacceptable,
> > but this will take *hours* in many clusters. It's completely possible
> > that the machine crashes while the enabling is in progress.
> >
> > And after restarting postgres or even pg_checksums, you'll just get a
> > message that there's no control file. How on earth is a normal user
> > supposed to recover from that? Now, you could have a check for the
> > control file under the temporary name, and emit a hint about renaming,
> > but that has its own angers (like people renaming it just to start
> > postgres).
>
> Ok, thanks for explaining. 
>
> I guess if we check for the temporary name in postmaster during startup
> if pg_control isn't there then a more generally useful name like
> "pg_control.maintenance" should be picked. We could then spit out a nice
> error message or hint like "the cluster has been disabled for
> maintenance. In order to start it up anyway, rename
> pg_control.maintenance to pg_control" or so.

To be very clear: I am going to try to stop any patch with this
mechanism from going into the tree. I think it's an absurdly bad
idea. There'd need to be significant support from a number of other
committers to convince me otherwise.

> In any case, this would be more of a operational or availability issue
> and not a data-loss issue, as I feared from your previous mails.

It's just about undistinguishable for normal users.

> > And you're basically adding it because Fabien doesn't like
> > postmaster.pid and wants to invent another lockout mechanism in this
> > thread.
>
> I think the hazard of another DBA (or some automated configuration
> management or HA tool for that matter) looking at postmaster.pid,
> deciding that it is not a legit file from a running instance, deleting
> it and then starting up Postgres while pg_checksums is still at work is
> worse than the above scenario, but maybe if we make the content of
> postmaster.pid clear enough (like "maintenance in progress"?) it would
> be enough of a hint?

Err, how would such a tool decide to do that safely? And if it did so,
how would it not cause problems in postgres' normal operation, given
that that postmaster.pid is crucial to prevent two postgres instances
running at the same time?

> Or do you have concrete suggestions on how this should work?

create a postmaster.pid with the pid of pg_checksums. That'll trigger a
postgres start from checking whether that pid is still alive. There'd
probably need to be a tiny change to CreateLockFile() to prevent it from
checking whether any shared memory is connected. FWIW, it'd probably
actually be good if pg_checksums (and some other tools), did most if not
all the checks in CreateLockFile().

I'm not sure it needs to be this patch's responsibility to come up with
a scheme here at all however. pg_rewind, pg_resetwal, pg_upgrade all
don't really have a lockout mechanism, and it hasn't caused a ton of
problems. I think it'd be good to invent something better, but it can't
be some half assed approach that'll lead to people think their database
is gone.

> I had the feeling (ab)using postmaster.pid for this would fly less than
> using the same scheme as pg_upgrade does, but I'm fine doing it either
> way.

I don't think pg_upgrade is a valid comparison, given that the goal
there is to permanently disable the cluster. It also emits a hint about
it. And only does so at the end of a run.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-19 16:51:39 Re: Willing to fix a PQexec() in libpq module
Previous Message Robert Haas 2019-03-19 16:45:09 Re: Libpq support to connect to standby server as priority