Re: Online enabling of checksums

From: Andres Freund <andres(at)anarazel(dot)de>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Online enabling of checksums
Date: 2018-04-07 04:26:38
Message-ID: 20180407042638.hw6gbrhsnsxcv6ia@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-04-06 17:59:28 -0700, Andres Freund wrote:
> + /*
> + * Create a database list. We don't need to concern ourselves with
> + * rebuilding this list during runtime since any database created after
> + * this process started will be running with checksums turned on from the
> + * start.
> + */
>
> Why is this true? What if somebody runs CREATE DATABASE while the
> launcher / worker are processing a different database? It'll copy the
> template database on the filesystem level, and it very well might not
> yet have checksums set? Afaict the second time we go through this list
> that's not cought.

*caught

It's indeed trivial to reproduce this, just slowing down a checksum run
and copying the database yields:
./pg_verify_checksums -D /srv/dev/pgdev-dev
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated checksum 45A7 but expected 0
pg_verify_checksums: checksum verification failed in file "/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated checksum 8C7D but expected 0

further complaints:

The new isolation test cannot be re-run on an existing cluster. That's
because the first test expects isolationtests to be disabled. As even
remarked upon:
# The checksum_enable suite will enable checksums for the cluster so should
# not run before anything expecting the cluster to have checksums turned off

How's that ok? You can leave database wide objects around, but the
cluster-wide stuff needs to be cleaned up.

The tests don't actually make sure that no checksum launcher / apply is
running anymore. They just assume that it's gone once the GUC shows
checksums have been set. If you wanted to make the tests stable, you'd
need to wait for that to show true *and* then check that no workers are
around anymore.

If it's not obvious: This isn't ready, should be reverted, cleaned up,
and re-submitted for v12.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-04-07 04:26:51 Re: [HACKERS] Runtime Partition Pruning
Previous Message Amit Langote 2018-04-07 04:18:08 Re: [HACKERS] path toward faster partition pruning