Re: Online enabling of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: Online enabling of checksums
Date: 2018-04-05 09:33:00
Message-ID: f48c0567-906f-7e6f-0740-78a91d3bb957@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/5/18 11:07 AM, Magnus Hagander wrote:
>
>
> On Wed, Apr 4, 2018 at 12:11 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
> ...
>
> It however still fails to initialize the attempts field after allocating
> the db entry in BuildDatabaseList, so if you try running with
> -DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:
>
>  WARNING:  attempts = -1684366952
>  WARNING:  attempts = 1010514489
>  WARNING:  attempts = -1145390664
>  WARNING:  attempts = 1162101570
>
> I guess those are not the droids we're looking for?
>
>
> When looking at that and after a quick discussion, we just decided it's
> better to completely remove the retry logic. As you mentioned in some
> earlier mail, we had all this logic to retry databases (unlikely) but
> not relations (likely). Attached patch simplifies it to only detect the
> "database was dropped" case (which is fine), and consider every other
> kind of failure a permanent one and just not turn on checksums in those
> cases.
>

OK, works for me.

>
>
> Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
> initialized. I think it only ever gets set in launcher_exit(), but that
> does not seem sufficient. I suspect it's the reason for this behavior:
>
>
> It's supposed to get initialized in ChecksumHelperShmemInit() -- fixed.
> (The whole memset-to-zero)
>

OK, seems fine now.

>
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  database "template0" does not allow connections
>     HINT:  Allow connections using ALTER DATABASE and try again.
>     test=# alter database template0 allow_connections = true;
>     ALTER DATABASE
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  could not start checksumhelper: already running
>     test=# select pg_disable_data_checksums();
>      pg_disable_data_checksums
>     ---------------------------
>
>     (1 row)
>
>     test=# select pg_enable_data_checksums(10, 10);
>     ERROR:  could not start checksumhelper: has been cancelled
>
>
> Turns out that wasn't the problem. The problem was that we *set* it
> before erroring out with the "does not allow connections", but never
> cleared it. In that case, it would be listed as launcher-is-running even
> though the launcher was never started.
>
> Basically the check for datallowconn was put in the wrong place. That
> check should go away completely once we merge (because we should also
> merge the part that allows us to bypass it), but for now I have moved
> the check to the correct place.
>

Ah, OK. I was just guessing.

>
>
> Attached is a diff with a couple of minor comment tweaks, and correct
> initialization of the attempts field.
>
>
> Thanks. This is included in the attached update, along with the above
> fixes and some other small touches from Daniel. 
>

This patch version seems fine to me. I'm inclined to mark it RFC.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-04-05 10:09:31 Re: pgsql: New files for MERGE
Previous Message Amit Langote 2018-04-05 09:24:58 Re: [HACKERS] Add support for tuple routing to foreign partitions