Re: [WIP] In-place upgrade

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Zdenek Kotala <Zdenek(dot)Kotala(at)sun(dot)com>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] In-place upgrade
Date: 2008-11-06 17:15:09
Message-ID: 200811061715.mA6HF9g00508@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> > That's all fine and dandy, except that it presumes that you can perform
> > SELECT/UPDATE/DELETE on V3 tuple versions; you can't just pretend that
> > A-E aren't there until they get converted. Which is exactly the
> > overhead we were looking to avoid.
>
> I don't understand this comment at all. Unless you have some sort of
> magical wand in your back pocket that will instantaneously transform
> the entire database, there is going to be a period of time when you
> have to cope with both V3 and V4 pages. ISTM that what we should be
> talking about here is:
>
> (1) How are we going to do that in a way that imposes near-zero
> overhead once the entire database has been converted?
> (2) How are we going to do that in a way that is minimally invasive to the code?
> (3) Can we accomplish (1) and (2) while still retaining somewhat
> reasonable performance for V3 pages?
>
> Zdenek's initial proposal did this by replacing all of the tuple
> header macros with functions that were conditionalized on page
> version. I think we agree that's not going to work. That doesn't
> mean that there is no approach that can work, and we were discussing
> possible ways to make it work upthread until the thread got hijacked
> to discuss the right way of handling page expansion. Now that it
> seems we agree that a transaction can be used to move tuples onto new
> pages, I think we'd be well served to stop talking about page
> expansion and get back to the original topic: where and how to insert
> the hooks for V3 tuple handling.

I think the above is a good summary. For me, the problem with any
approach that has information about prior-version block formats in the
main code path is code complexity, and secondarily performance.

I know there is concern that converting all blocks on read-in might
expand the page beyond 8k in size. One idea Heikki had was to require
some tool must be run on minor releases before a major upgrade to
guarantee there is enough free space to convert the block to the current
format on read-in, which would localize the information about prior
block formats. We could release the tool in minor branches around the
time as a major release. Also consider that there are very few releases
that expand the page size.

For these reasons, the expand-the-page-beyond-8k problem should not be
dictating what approach we take for upgrade-in-place because there are
workarounds for the problem, and the problem is rare. I would like us
to again focus on converting the pages to the current version format on
read-in, and perhaps a tool to convert all old pages to the new format.

FYI, we are also going to need the ability to convert all pages to the
current format for multi-release upgrades. For example, if you did
upgrade-in-place from 8.2 to 8.3, you are going to need to update all
pages to the 8.3 format before doing upgrade-in-place to 8.4; perhaps
vacuum can do something like this on a per-table basis, and we can
record that status a pg_class column.

Also, consider that when we did PITR, we required commands before and
after the tar so that there was a consistent API for PITR, and later had
to add capabilities to those functions, but the user API didn't change.

I envision a similar system where we have utilities to guarantee all
pages have enough free space, and all pages are the current version,
before allowing an upgrade-in-place to the next version. Such a
consistent API will make the job for users easier and our job simpler,
and with upgrade-in-place, where we have limited time and resources to
code this for each release, simplicity is important.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-11-06 17:16:06 Re: plperl needs upgrade for Fedora 10
Previous Message Tom Lane 2008-11-06 17:14:07 Re: RAM-only temporary tables