Re: Online checksums patch - once again

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online checksums patch - once again
Date: 2020-01-23 17:23:09
Message-ID: CA+TgmoaryNACeLnLpfhx==gYy1+59VhMtrWspcZGay9eDvYHMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 23, 2020 at 6:19 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> A bigger question is how to handle the offline capabilities. pg_checksums can
> enable or disable checksums in an offline cluster, which will put the cluster
> in a state where the pg_control file and the catalog don't match at startup.
> One strategy could be to always trust the pg_control file and alter the catalog
> accordingly, but that still leaves a window of inconsistent cluster state.

I suggest that we define things so that the catalog state is only
meaningful during a state transition. That is, suppose the cluster
state is either "on", "enabling", or "off". When it's "on", checksums
are written and verified. When it is "off", checksums are not written
and not verified. When it's "enabling", checksums are written but not
verified. Also, when and only when the state is "enabling", the
background workers that try to rewrite relations to add checksums run,
and those workers look at the catalog state to figure out what to do.
Once the state changes to "on", those workers don't run any more, and
so the catalog state does not make any difference.

A tricky problem is to handling the case where the state is switched
from "enabling" to "on" and then back to "off" and then to "enabling"
again. You don't want to confuse the state from the previous round of
enabling with the state for the current round of enabling. Suppose in
addition to storing the cluster-wide state of on/off/enabling, we also
store an "enable counter" which is incremented every time the state
goes from "off" to "enabling". Then, for each database and relation,
we store a counter that indicates the value of the enable counter at
the time we last scanned/rewrote that relation to set checksums. Now,
you're covered. And, to save space, it can probably be a 32-bit
counter, since 4 billion disable/reenable cycles ought to be enough
for anybody.

It would not be strictly necessary to store this in pg_class. Another
thing that could be done is to store it in a separate system table
that could even be truncated when enabling is not in progress - though
it would be unwise to assume that it's always truncated at the
beginning of an enabling cycle, since it would be hard to guarantee
that the previous enabling cycle didn't fail when trying to truncate.
So you'd probably still end up with something like the counter
approach. I am inclined to think that inventing a whole new catalog
for this is over-engineering, but someone might think differently.
Note that creating a table while enabling is in progress needs to set
the enabling counter for the new table to the new value of the
enabling counter, not the old one, because the new table starts empty
and won't end up with any pages that don't have valid checksums.
Similarly, TRUNCATE, CLUSTER, VACUUM FULL, and rewriting variants of
ALTER TABLE can set the new value for the enabling counter as a side
effect. That's probably easier and more efficient if it's just value
in pg_class than if they have to go poking around in another catalog.
So I am tentatively inclined to think that just putting it in pg_class
makes more sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-01-23 17:23:14 Re: making the backend's json parser work in frontend code
Previous Message Pavel Stehule 2020-01-23 17:21:42 Re: [Proposal] Global temporary tables