Quick Links

Re: Online enabling of checksums

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject:	Re: Online enabling of checksums
Date:	2018-03-31 00:08:45
Message-ID:	5d41c57e-59c4-c99a-93c0-d24504beb452@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

I've been looking at the patch a bit more, and I think there are a
couple of fairly serious issues in the error handling.

Firstly ChecksumHelperLauncherMain spends quite a bit of effort on
skipping dropped databases, but ChecksumHelperWorkerMain does not do the
same thing with tables. I'm not exactly sure why, but I'd say dropped
tables are more likely than dropped databases (e.g. because of temporary
tables) and it's strange to gracefully handle the more rare case.

Now, when a table gets dropped after BuildRelationList() does it's work,
we end up calling ProcessSingleRelationByOid() on that OID. Which calls
relation_open(), which fails with elog(ERROR), terminating the whole
bgworker with an error like this:

ERROR: could not open relation with OID 16632
LOG: background worker "checksumhelper worker" (PID 27152) exited
with exit code 1

Which however means the error handling in ChecksumHelperWorkerMain() has
no chance to kick in, because the bgworker dies right away. The code
looks like this:

foreach(lc, RelationList)
{
ChecksumHelperRelation *rel
= (ChecksumHelperRelation *) lfirst(lc);

if (!ProcessSingleRelationByOid(rel->reloid, strategy))
{
ChecksumHelperShmem->success = ABORTED;
break;
}
else
ChecksumHelperShmem->success = SUCCESSFUL;
}
list_free_deep(RelationList);

Now, assume the first relation in the list still exists and gets
processed correctly, so "success" ends up being SUCCESSFUL. Then the
second OID is the dropped relation, which kills the bgworker ...

The launcher however does not realize anything went wrong, because the
flag still says SUCCESSFUL. And so it merrily switches checksums to
"on", leading to this on the rest of the relations:

WARNING: page verification failed, calculated checksum 58644 but
expected 0
ERROR: invalid page in block 0 of relation base/16631/16653

Yikes!

IMHO this error handling is broken by design - two things need to
happen, I think: (a) graceful handling of dropped relations and (b)
proper error reporting from the bgworder.

(a) Should not be difficult to do, I think. We don't have relation_open
with a missing_ok flag, but implementing something like that should not
be difficult. Even a simple "does OID exist" should be enough.

(b) But just handling dropped relations is not enough, because I could
simply kill the bgworker directly, and it would have exactly the same
consequences. What needs to happen is something like this:

ChecksumHelperResult local_success = SUCCESFUL;

foreach(lc, RelationList)
{
ChecksumHelperRelation *rel
= (ChecksumHelperRelation *) lfirst(lc);

if (!ProcessSingleRelationByOid(rel->reloid, strategy))
{
local_success = ABORTED;
break;
}
}
list_free_deep(RelationList);

ChecksumHelperShmem->success = local_success;

That is, leave the flag in shred memory set to FAILED until the very
last moment, and only when everything went fine set it to SUCCESSFUL.

BTW I don't think handling dropped relations by letting the bgworker
crash and restart is an acceptable approach. That would pretty much mean
any DDL changes are prohibited on the system while the checksum process
is running, which is not quite possible (e.g. for systems doing stuff
with temporary tables).

Which however reminds me I've also ran into a bug in the automated retry
system, because you may get messages like this:

ERROR: failed to enable checksums in "test", giving up (attempts
639968292).

This happens because BuildDatabaseList() does just palloc() and does not
initialize the 'attempts' field. It may get initialized to 0 by chance,
but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
high value.

BTW both ChecksumHelperRelation and ChecksumHelperDatabase have
'success' field which is actually unused (and uninitialized).

But wait - there is more ;-) BuildRelationList is using heap_beginscan
with the regular snapshot, so it does not see uncommitted transactions.
So if you do this:

BEGIN;
CREATE TABLE t AS SELECT i FROM generate_series(1,10000000) s(i);
-- run pg_enable_data_checksums() from another session
SELECT COUNT(*) FROM t;

then the table will be invisible to the checksum worker, it won't have
checksums updated and the cluster will get checksums enabled. Which
means this:

test=# SELECT COUNT(*) FROM t;
WARNING: page verification failed, calculated checksum 27170 but
expected 0
ERROR: invalid page in block 0 of relation base/16677/16683

Not sure what's the best way to fix this - maybe we could wait for all
running transactions to end, before starting the work.

And if you try this with a temporary table (not hidden in transaction,
so the bgworker can see it), the worker will fail with this:

ERROR: cannot access temporary tables of other sessions

But of course, this is just another way how to crash without updating
the result for the launcher, so checksums may end up being enabled anyway.

Not great, I guess :-(

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: Online enabling of checksums at 2018-03-20 14:40:53 from Magnus Hagander

Responses

Re: Online enabling of checksums at 2018-03-31 12:02:54 from Magnus Hagander

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2018-03-31 00:09:42	Re: Change RangeVarGetRelidExtended() to take flags argument?
Previous Message	Chapman Flack	2018-03-31 00:07:31	Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility