Re: [DESIGN] Incremental checksums

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: David Christensen <david(at)endpoint(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: [DESIGN] Incremental checksums
Date: 2015-07-13 22:49:20
Message-ID: 55A44070.7040802@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/13/15 4:02 PM, David Christensen wrote:
>
>> On Jul 13, 2015, at 3:50 PM, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com> wrote:
>>
>> On 7/13/15 3:26 PM, David Christensen wrote:
>>> * Incremental Checksums
>>>
>>> PostgreSQL users should have a way up upgrading their cluster to use data checksums without having to do a costly pg_dump/pg_restore; in particular, checksums should be able to be enabled/disabled at will, with the database enforcing the logic of whether the pages considered for a given database are valid.
>>>
>>> Considered approaches for this are having additional flags to pg_upgrade to set up the new cluster to use checksums where they did not before (or optionally turning these off). This approach is a nice tool to have, but in order to be able to support this process in a manner which has the database online while the database is going throught the initial checksum process.
>>
>> It would be really nice if this could be extended to handle different page formats as well, something that keeps rearing it's head. Perhaps that could be done with the cycle idea you've described.
>
> I had had this thought too, but the main issues I saw were that new page formats were not guaranteed to take up the same space/storage, so there was an inherent limitation on the ability to restructure things out *arbitrarily*; that being said, there may be a use-case for the types of modifications that this approach *would* be able to handle.

After some discussion on IRC, I there's 2 main points to consider.

First, we're currently unhappy with how relfrozenxid works, and this
proposal follows the same pattern of having essentially a counter field
in pg_class. Perhaps this is OK because things like checksum really
shouldn't change that often. (My inclination is that fields in pg_class
are OK for now.)

Second, there are 4 use cases here that are very similar. We should
*consider* them now, while designing this. That doesn't mean the first
patch needs to support anything other than checksums.

1) Page layout changes
2) Page run-time changes (currently only checksums)
3) Tuple layout changes (ie: HEAP_MOVED_IN)
4) Tuple run-time changes (ie: DROP COLUMN)

1 is currently handled in pg_upgrade by forcing a page-by-by-page copy
during upgrade. Doing this online would require the same kind of
conversion plugin pg_upgrade uses. If we want to support conversions
that need extra free space on a page we'd also need support for that.

2 is similar to 1, except this can change via GUC or similar. Checksums
are an example of this, as is creating extra free space on a page to
support an upgrade.

3 & 4 are tuple-level equivalents to 1 & 2.

I think the bigger challenge to these things is how to track the status
of a conversion (as opposed to the conversion function itself).

- Do we want each of these to have a separate counter in pg_class?
(rellastchecksum, reloldestpageversion, etc)

- Should that info be combined?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-07-13 23:33:27 Re: PostgreSQL 9.5 Alpha 1 build fail with perl 5.22
Previous Message Simon Riggs 2015-07-13 22:38:44 Re: TABLESAMPLE patch is really in pretty sad shape