Re: Online enabling of checksums

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Michael Banck <michael(dot)banck(at)credativ(dot)de>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Online enabling of checksums
Date: 2018-03-15 13:01:26
Message-ID: 7FDA8717-89CE-453E-9AD6-4424597F3C95@yesql.se
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

> On 10 Mar 2018, at 16:09, Michael Banck <michael(dot)banck(at)credativ(dot)de> wrote:

> I had a closer look at v3 of the patch now.

Thanks, much appreciated! Sorry for the late response, just came back from a
conference and have had little time for hacking.

All whitespace, punctuation and capitalization comments have been addressed
with your recommendations, so I took the liberty to trim them from the
response.

> I am aware that this is discussed already, but as-is the user experience
> is pretty bad, I think pg_enable_data_checksums() should either bail
> earlier if it cannot connect to all databases, or it should be better
> documented.

Personally I think we should first attempt to solve the "allow-connections in
background workers” issue which has been raised on another thread. For now I’m
documenting this better.

>> diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
>> index f4bc2d4161..89afecb341 100644
>> --- a/doc/src/sgml/wal.sgml
>> +++ b/doc/src/sgml/wal.sgml
>> @@ -230,6 +230,73 @@
> [...]
>> + <para>
>> + Checksums can be enabled or disabled online, by calling the appropriate
>> + <link linkend="functions-admin-checksum">functions</link>.
>> + Disabling of checksums takes effect immediately when the function is called.
>> + </para>
>> +
>> + <para>
>> + Enabling checksums will put the cluster in <literal>inprogress</literal> mode.
>> + During this time, checksums will be written but not verified. In addition to
>> + this, a background worker process is started that enables checksums on all
>> + existing data in the cluster. Once this worker has completed processing all
>> + databases in the cluster, the checksum mode will automatically switch to
>> + <literal>on</literal>.
>> + </para>
>> +
>> + <para>
>> + If the cluster is stopped while in <literal>inprogress</literal> mode, for
>> + any reason, then this process must be restarted manually. To do this,
>> + re-execute the function <function>pg_enable_data_checksums()</function>
>> + once the cluster has been restarted.
>> + </para>
>
> Maybe document the above issue here, unless it is clear that the
> templat0-needs-to-allow-connections issue will go away before the patch
> is pushed.

I have added a paragraph on allowing connections here, as well as a note that
template0 will need to be handled.

>> +void
>> +SetDataChecksumsOn(void)
>> +{
>> + if (!DataChecksumsEnabledOrInProgress())
>> + elog(ERROR, "Checksums not enabled or in progress");
>> +
>> + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
>> +
>> + if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
>> + {
>> + LWLockRelease(ControlFileLock);
>> + elog(ERROR, "Checksums not in in_progress mode");
>
> The string used in "SHOW data_checksums" is "inprogress", not
> "in_progress”.

Fixed.

>> @@ -7769,6 +7839,16 @@ StartupXLOG(void)
>> CompleteCommitTsInitialization();
>>
>> /*
>> + * If we reach this point with checksums in inprogress state, we notify
>> + * the user that they need to manually restart the process to enable
>> + * checksums.
>> + */
>> + if (ControlFile->data_checksum_version == PG_DATA_CHECKSUM_INPROGRESS_VERSION)
>> + ereport(WARNING,
>> + (errmsg("checksum state is \"in progress\" with no worker"),
>> + errhint("Either disable or enable checksums by calling the pg_disable_data_checksums() or pg_enable_data_checksums() functions.")));
>
> Again, string is "inprogress", not "in progress", not sure if that
> matters.

I think it does, we need to be consistent in userfacing naming. Updated.

>> diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c
>> new file mode 100644
>> index 0000000000..6aa71bcf30
>> --- /dev/null
>> +++ b/src/backend/postmaster/checksumhelper.c
>> @@ -0,0 +1,631 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * checksumhelper.c
>> + * Backend worker to walk the database and write checksums to pages
>
> Backend or Background?

“Background” is the right term here, fixed.

>> +static bool
>> +ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, BufferAccessStrategy strategy)
>> +{
>> + BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum);
>> + BlockNumber b;
>> +
>> + for (b = 0; b < numblocks; b++)
>> + {
>> + Buffer buf = ReadBufferExtended(reln, forkNum, b, RBM_NORMAL, strategy);
>> + Page page;
>> + PageHeader pagehdr;
>> + uint16 checksum;
>> +
>> + /* Need to get an exclusive lock before we can flag as dirty */
>> + LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);
>> +
>> + /* Do we already have a valid checksum? */
>> + page = BufferGetPage(buf);
>> + pagehdr = (PageHeader) page;
>> + checksum = pg_checksum_page((char *) page, b);
>
> This looks like it does not take the segment number into account;
> however it is also unclear to me what the purpose of this is, as
> checksum is never validated against the pagehdr, and nothing is done
> with it.

> I guess the above block running pg_checksum_page() is a leftover from
> previous versions of the patch and should be removed…

Correct, the checksum and pagehdr was used in the previous optimization to skip
forced writes where the checksum was valid. That optimization was however
based on a faulty assumption and was removed in the v3 patch. The leftover
variables are now removed.

>> + * it. Get a fresh list of databases to detect the second case with.
>
> That sentence looks unfinished or at least is unclear to me.

Updated to indicate what we mean by “second case”.

>> + * Any database that still exists but failed we retry for a limited
>
> I'm not a native speaker, but this looks wrong to me as well, maybe "We
> retry any database that still exists but failed for a limited [...]”?

Updated and extended a bit for clarity.

Fixing this review comment also made me realize that checksumhelper.c was using
%s for outputting the database name in errmsg(), but \”%s\” is what we commonly
use. Updated all errmsg() invocations to quote the database name.

>> diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
>> index 0fe98a550e..4bb2b7e6ec 100644
>> --- a/src/bin/pg_upgrade/controldata.c
>> +++ b/src/bin/pg_upgrade/controldata.c
>> @@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl,
>> */
>>
>> /*
>> + * If checksums have been turned on in the old cluster, but the
>> + * checksumhelper have yet to finish, then disallow upgrading. The user
>> + * should either let the process finish, or turn off checksums, before
>> + * retrying.
>> + */
>> + if (oldctrl->data_checksum_version == 2)
>> + pg_fatal("transition to data checksums not completed in old cluster\n");
>> +
>> + /*
>> * We might eventually allow upgrades from checksum to no-checksum
>> * clusters.
>> */
>
> See below about src/bin/pg_upgrade/pg_upgrade.h having
> data_checksum_version be a bool.
>
> I checked pg_ugprade (from master to master though), and could find no
> off-hand issues, i.e. it reported all issues correctly.

data_checksum_version is indeed defined bool, but rather than being an actual
bool it’s a bool backed by a char via a typedef in c.h. This is why it works
to assign 0, 1 or 2 without issues.

That being said, I agree that it reads odd now that checksum version isn’t just
0 or 1 which mentally translate neatly to false or true. Since this is the
patch introducing version 2, I changed data_checksum_version to a uint32 as
that makes the intent much clearer.

>> +check:
>> + $(prove_check)
>> +
>> +installcheck:
>> + $(prove_installcheck)
>
> If I run "make check" in src/bin/pg_verify_checksums, I get a fat perl
> error:

Thats because the check and installcheck targets are copy-pasteos, there are no
tests for pg_verify_checksums currently so the targets should not be in the
Makefile. Fixed.

>> + if (DataDir == NULL)
>> + {
>> + if (optind < argc)
>> + DataDir = argv[optind++];
>> + else
>> + DataDir = getenv("PGDATA");
>> + }
>> +
>> + /* Complain if any arguments remain */
>> + if (optind < argc)
>> + {
>> + fprintf(stderr, _("%s: too many command-line arguments (first is \"%s\")\n"),
>> + progname, argv[optind]);
>> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
>> + progname);
>> + exit(1);
>> + }
>> +
>> + if (DataDir == NULL)
>> + {
>> + fprintf(stderr, _("%s: no data directory specified\n"), progname);
>> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
>> + exit(1);
>> + }
>
> Those two if (DataDir == NULL) checks could maybe be put together into
> one block.

Moved the check into the first block, as it makes code clearr and doesn’t
change the order in which the error messages for missing datadir and
too-many-arguments will be output.

>> diff --git a/src/include/postmaster/checksumhelper.h b/src/include/postmaster/checksumhelper.h
>> new file mode 100644
>> index 0000000000..7f296264a9
>> --- /dev/null
>> +++ b/src/include/postmaster/checksumhelper.h
>> @@ -0,0 +1,31 @@
>> +/*-------------------------------------------------------------------------
>> + *
>> + * checksumhelper.h
>> + * header file for checksum helper deamon
>
> "deamon" is surely wrong (it'd be "daemon"), but maybe "(background)
> worker" is better?

Yes, "background worker” is better, updated.

>> diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
>> index 85dd10c45a..bd46bf2ce6 100644
>> --- a/src/include/storage/bufpage.h
>> +++ b/src/include/storage/bufpage.h
>> @@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
>> */
>> #define PG_PAGE_LAYOUT_VERSION 4
>> #define PG_DATA_CHECKSUM_VERSION 1
>> +#define PG_DATA_CHECKSUM_INPROGRESS_VERSION 2
>
> I am not very sure about the semantics of PG_DATA_CHECKSUM_VERSION being
> 1, but I assumed it was a version, like, if we ever decide to use a
> different checksumming algorithm, we'd bump it to 2.
>
> Now PG_DATA_CHECKSUM_INPROGRESS_VERSION is defined to 2, which I agree
> is convenient, but is there some strategy what to do about this in case
> the PG_DATA_CHECKSUM_VERSION needs to be increased?

I don’t think this has been discussed in any thread dealing with enabling
checksums in an online cluster, where using the version was mentioned (or my
archive search is failing me). Iff another algorithm was added I assume we’d
need something like the proposed checksumhelper to allow migrations from
version 1. Just as we now use version 2 to indicate that we are going from
version 0 to version 1, the same can be used for going from version 1 to 3 as
long as there is a source and destination version recorded.

I don’t know what such a process would look like, but I don’t see that we are
blocking a future new checksum algorithm by using version 2 for inprogress
(although I agree that using the version here is closer to convenient than
elegant).

Attached v4 of this patch, which addresses this review, and flipping status
back in the CF app back to Needs Review.

cheers ./daniel

Attachment Content-Type Size
online_checksums4.patch application/octet-stream 74.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-03-15 13:22:52 Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath
Previous Message Aleksander Alekseev 2018-03-15 12:10:08 Re: Re: [submit code] I develop a tool for pgsql, how can I submit it