Re: Online enabling of checksums

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Online enabling of checksums
Date: 2018-04-07 06:57:03
Message-ID: CABUevEzhkmkNCHzQ_MuuqmmXXNLbEu1P08URuoG3uCrnBg6MgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 7, 2018 at 6:26 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2018-04-06 17:59:28 -0700, Andres Freund wrote:
> > + /*
> > + * Create a database list. We don't need to concern ourselves with
> > + * rebuilding this list during runtime since any database created
> after
> > + * this process started will be running with checksums turned on
> from the
> > + * start.
> > + */
> >
> > Why is this true? What if somebody runs CREATE DATABASE while the
> > launcher / worker are processing a different database? It'll copy the
> > template database on the filesystem level, and it very well might not
> > yet have checksums set? Afaict the second time we go through this list
> > that's not cought.
>
> *caught
>
> It's indeed trivial to reproduce this, just slowing down a checksum run
> and copying the database yields:
> ./pg_verify_checksums -D /srv/dev/pgdev-dev
> pg_verify_checksums: checksum verification failed in file
> "/srv/dev/pgdev-dev/base/16385/2703", block 0: calculated checksum 45A7
> but expected 0
> pg_verify_checksums: checksum verification failed in file
> "/srv/dev/pgdev-dev/base/16385/2703", block 1: calculated checksum 8C7D
> but expected 0
>
>
>
> further complaints:
>
> The new isolation test cannot be re-run on an existing cluster. That's
> because the first test expects isolationtests to be disabled. As even
> remarked upon:
> # The checksum_enable suite will enable checksums for the cluster so should
> # not run before anything expecting the cluster to have checksums turned
> off
>
> How's that ok? You can leave database wide objects around, but the
> cluster-wide stuff needs to be cleaned up.
>
>
> The tests don't actually make sure that no checksum launcher / apply is
> running anymore. They just assume that it's gone once the GUC shows
> checksums have been set. If you wanted to make the tests stable, you'd
> need to wait for that to show true *and* then check that no workers are
> around anymore.
>
>
> If it's not obvious: This isn't ready, should be reverted, cleaned up,
> and re-submitted for v12.
>

While I do think that it's still definitely fixable in time for 11, I won't
argue for it.Will revert.

Note however that I'm sans-laptop until Sunday, so I will revert it then or
possibly Monday.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gaetano Mendola 2018-04-07 07:12:42 Re: Corrupted data due to system power failure
Previous Message Amit Langote 2018-04-07 06:41:35 Re: [HACKERS] path toward faster partition pruning