Re: [WIP] In-place upgrade

From: Decibel! <decibel(at)decibel(dot)org>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Zdenek Kotala <Zdenek(dot)Kotala(at)sun(dot)com>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] In-place upgrade
Date: 2008-11-10 00:12:21
Message-ID: 75081FC9-2909-496E-BAFB-2541636F66DA@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Nov 6, 2008, at 1:31 PM, Bruce Momjian wrote:
>> 3. What about multi-release upgrades? Say someone wants to upgrade
>> from 8.3 to 8.6. 8.6 only knows how to read pages that are
>> 8.5-and-a-half or better, 8.5 only knows how to read pages that are
>> 8.4-and-a-half or better, and 8.4 only knows how to read pages that
>> are 8.3-and-a-half or better. So the user will have to upgrade to
>> 8.3.MAX, then 8.4.MAX, then 8.5.MAX, and then 8.6.
>
> Yes.

I think that's pretty seriously un-desirable. It's not at all
uncommon for databases to stick around for a very long time and then
jump ahead many versions. I don't think we want to tell people they
can't do that.

More importantly, I think we're barking up the wrong tree by putting
migration knowledge into old versions. All that the old versions need
to do is guarantee a specific amount of free space per page. We
should provide a mechanism to tell a cluster what that free space
requirement is, and not hard-code it into the backend.

Unless I'm mistaken, there are only two cases we care about for
additional space: per-page and per-tuple. Those requirements could
also vary for different types of pg_class objects. What we need is an
API that allows an administrator to tell the database to start
setting this space aside. One possibility:

pg_min_free_space( version, relkind, bytes_per_page, bytes_per_tuple );
pg_min_free_space_index( version, indexkind, bytes_per_page,
bytes_per_tuple );

version: This would be provided as a safety mechanism. You would have
to provide the major version that matches what the backend is
running. See below for an example.

relkind: Essentially, heap vs toast, though I suppose it's possible
we might need this for sequences.

indexkind: Because we support different types of indexes, I think we
need to handle them differently than heap/toast. If we wanted, we
could have a single function that demands that indexkind is NULL if
relkind != 'index'.

bytes_per_(page|tuple): obvious. :)

Once we have an API, we need to get users to make use of it. I'm
thinking add something like the following to the release notes:

"To upgrade from a prior version to 8.4, you will need to run some of
the following commands, depending on what version you are currently
using:

For version 8.3:
SELECT pg_min_free_space( '8.3', 'heap', 4, 12 );
SELECT pg_min_free_space( '8.3', 'toast', 4, 12 );

For version 8.2:
SELECT pg_min_free_space( '8.2', 'heap', 14, 12 );
SELECT pg_min_free_space( '8.2', 'toast', 14, 12 );
SELECT pg_min_free_space_index( '8.2', 'b-tree', 4, 4);"

(Note I'm just pulling numbers out of thin air in this example.)

As you can see, we pass in the version number to ensure that if
someone accidentally cut and pastes the wrong stuff they know what
they did wrong immediately.

One downside to this scheme is that it doesn't provide a mechanism to
ensure that all required minimum free space requirements were passed
in. Perhaps we want a function that takes an array of complex types
and forces you to supply information for all known storage
mechanisms. Another possibility would be to pass in some kind of
binary format that contains a checksum.

Even if we do come up with a pretty fool-proof way to tell the old
version what free space it needs to set aside, I think we should
still have a mechanism for the new version to know exactly what the
old version has set aside, and if it's actually been accomplished or
not. One option that comes to mind is to add min_free_space_per_page
and min_free_space_per_tuple to pg_class. Normally these fields would
be NULL; the old version would only set them once it had verified
that all pages in a given relation met those requirements (presumably
via vacuum). The new version would check all these values on startup
to ensure they made sense.

OTOH, we might not want to go mucking around with changing the
catalog for older versions (I'm not even sure if we can). So perhaps
it would be better to store this information in a separate table, or
maybe a separate file. That might be best anyway; we generally
wouldn't need this information, so it would be nice if it wasn't
bloating pg_class all the time.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Decibel! 2008-11-10 00:25:50 Re: ALTER DATABASE SET TABLESPACE vs crash safety
Previous Message Simon Riggs 2008-11-10 00:11:21 Re: Reducing some DDL Locks to ShareLock