Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: David Christensen <david(at)endpoint(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure
Date: 2017-02-23 06:41:00
Message-ID: ea820300-174c-53c7-ec67-c8d1ea35daf9@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/20/17 11:22 AM, David Christensen wrote:
>> - If an entire cluster is going to be considered as checksummed, then even databases that don't allow connections would need to get enabled.
> Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding, unless there’s a way to connect to those databases internally that bypasses that check. Open to ideas, though for a first pass seems like the “datallowconn” approach is the least amount of work.

The problem with ignoring datallowconn is any database where that's
false is fair game for CREATE DATABASE. I think just enforcing that
everything's connectable is good enough for now.

>> I like the idea of revalidation, but I'd suggest leaving that off of the first pass.
> Yeah, agreed.
>
>> It might be easier on a first pass to look at supporting per-database checksums (in this case, essentially treating shared catalogs as their own database). All normal backends do per-database stuff (such as setting current_database) during startup anyway. That doesn't really help for things like recovery and replication though. :/ And there's still the question of SLRUs (or are those not checksum'd today??).
> So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the specific database it treats it as in enforcing state, even if the rest of the cluster hasn’t completed? Hmm, might think on that a bit, but it seems pretty straightforward.

Something like that, yeah.

> What issues do you see affecting replication and recovery specifically (other than the entire cluster not being complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change things.

I'm specifically worried about the entire cluster not being complete.
That makes it harder for replicas to know what blocks they can and can't
verify the checksum on.

That *might* still be simpler than trying to handle converting the
entire cluster in one shot. If it's not simpler I certainly wouldn't do
it right now.

>> BTW, it occurs to me that this is related to the problem we have with trying to make changes that break page binary compatibility. If we had a method for handling that it would probably be useful for enabling checksums as well. You'd essentially treat an un-checksum'd page as if it was an "old page version". The biggest problem there is dealing with the potential that the new page needs to be larger than the old one was, but maybe there's some useful progress to be had in this area before tackling the "page too small" problem.
> I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future infrastructure.

Yeah, I was just mentioning it.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rafia Sabih 2017-02-23 06:41:52 Re: Enabling parallelism for queries coming from SQL or other PL functions
Previous Message Ashutosh Bapat 2017-02-23 06:40:38 Re: dropping partitioned tables without CASCADE