Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure

From: David Christensen <david(at)endpoint(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Add pg_disable_checksums() and supporting infrastructure
Date: 2017-02-20 17:22:48
Message-ID: 39130985-8BDD-46CD-8CB6-B8B9E90CC3B7@endpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Feb 19, 2017, at 8:14 PM, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> wrote:
>
> On 2/19/17 11:02 AM, David Christensen wrote:
>> My design notes for the patch were submitted to the list with little comment; see: https://www.postgresql.org/message-id/1E6E64E9-634B-43F4-8AA2-CD85AD92D2F8%40endpoint.com
>>
>> I have since added the WAL logging of checksum states, however I’d be glad to take feedback on the other proposed approaches (particularly the system catalog changes + the concept of a checksum cycle).]
>
> A couple notes:
>
> - AFAIK unlogged tables get checksummed today if checksums are enabled; the same should hold true if someone enables checksums on the whole cluster.

Agreed; AFAIK this should already be working if it’s using the PageIsVerified() API, since I just effectively modified the logic there, depending on state.

> - Shared relations should be handled as well; you don't mention them.

I agree that they should be handled as well; I thought I had mentioned it later in the design doc, though TBH I’m not sure if there is more involved than just visiting the global relations in pg_class. In addition we need to visit all forks of each relation. I’m not certain if loading those into shared_buffers would be sufficient; I think so, but I’d be glad to be informed otherwise. (As long as they’re already using the PageIsVerified() API call they get this logic for free.

> - If an entire cluster is going to be considered as checksummed, then even databases that don't allow connections would need to get enabled.

Yeah, the workaround for now would be to require “datallowconn" to be set to ’t' for all databases before proceeding, unless there’s a way to connect to those databases internally that bypasses that check. Open to ideas, though for a first pass seems like the “datallowconn” approach is the least amount of work.

> I like the idea of revalidation, but I'd suggest leaving that off of the first pass.

Yeah, agreed.

> It might be easier on a first pass to look at supporting per-database checksums (in this case, essentially treating shared catalogs as their own database). All normal backends do per-database stuff (such as setting current_database) during startup anyway. That doesn't really help for things like recovery and replication though. :/ And there's still the question of SLRUs (or are those not checksum'd today??).

So you’re suggesting that the data_checksums GUC get set per-database context, so once it’s fully enabled in the specific database it treats it as in enforcing state, even if the rest of the cluster hasn’t completed? Hmm, might think on that a bit, but it seems pretty straightforward.

What issues do you see affecting replication and recovery specifically (other than the entire cluster not being complete)? Since the checksum changes are WAL logged, seems you be no worse the wear on a standby if you had to change things.

> BTW, it occurs to me that this is related to the problem we have with trying to make changes that break page binary compatibility. If we had a method for handling that it would probably be useful for enabling checksums as well. You'd essentially treat an un-checksum'd page as if it was an "old page version". The biggest problem there is dealing with the potential that the new page needs to be larger than the old one was, but maybe there's some useful progress to be had in this area before tackling the "page too small" problem.

I agree it’s very similar; my issue is I don’t want to have to postpone handling a specific case for some future infrastructure. That being said, what I could see being a general case is the piece which basically walks all pages in the cluster; as long as the page checksum/format validation happens at Page read time, we could do page version checks/conversions at the same time, and the only special code we’d need is to keep track of which files still need to be visited and how to minimize the impact on the cluster via some sort of throttling/leveling. (It’s also similar to vacuum in that way, however we have been going out of our way to make vacuum smart enough to *avoid* visiting every page, so I think it is a different enough use case that we can’t tie the two systems together, nor do I feel like taking that project on.)

We could call the checksum_cycle something else (page_walk_cycle? bikeshed time!) and basically have the infrastructure to initiate online/gradual conversion/processing of all pages for free.

Thoughts?

David
--
David Christensen
End Point Corporation
david(at)endpoint(dot)com
785-727-1171

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-20 17:32:26 Re: Should we cacheline align PGXACT?
Previous Message Simon Riggs 2017-02-20 17:19:05 Re: Should we cacheline align PGXACT?