Re: Page-level version upgrade (was: Block-level CRC checks)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, decibel <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Page-level version upgrade (was: Block-level CRC checks)
Date: 2009-12-02 15:48:17
Message-ID: 603c8f070912020748o4dcd36fcm95d37d3064bfcca5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 1, 2009 at 11:45 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> >> The key issue, as I think Heikki identified at the time, is to figure
>> >> out how you're eventually going to get rid of the old pages. ?He
>> >> proposed running a pre-upgrade utility on each page to reserve the
>> >> right amount of free space.
>> >>
>> >> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00208.php
>> >
>> > Right. ?There were two basic approaches to handling a patch that would
>> > expand when upgraded to the new version --- either allow the system to
>> > write the old format, or have a pre-upgrade script that moved tuples so
>> > there was guaranteed enough free space in every page for the new format.
>> > I think we agreed that the later was better than the former, and it was
>> > easy because we don't have any need for that at this time. ?Plus the
>> > script would not rewrite every page, just certain pages that required
>> > it.
>>
>> While I'm always willing to be proven wrong, I think it's a complete
>> dead-end to believe that it's going to be easier to reserve space for
>> page expansion using the upgrade-from version rather than the
>> upgrade-to version.  I am firmly of the belief that the NEW pg version
>> must be able to operate on an unmodified heap migrated from the OLD pg
>> version.  After this set of patches was rejected, Zdenek actually
>
> Does it need to write the old version, and if it does, it has to carry
> around the old format structures all over the backend?  That was the
> unclear part.

I think it needs partial write support for the old version. If the
page is not expanding, then you can probably just replace pages in
place. But if the page is expanding, then you need to be able to move
individual tuples[1]. Since you want to be up and running while
that's happening, I think you probably need to be able to update xmax
and probably set hit bints. But you don't need to be able to add
tuples to the old page format, and I don't think you need complete
vacuum support, since you don't plan to reuse the dead space - you'll
just recycle the whole page once the tuples are all dead.

As for carrying it around the whole backend, I'm not sure how much of
the backend really needs to know. It would only be anything that
looks at pages, rather than, say, tuples, but I don't really know how
much code that touches. I suppose that's one of the things we need to
figure out.

[1] Unless, of course, you use a pre-upgrade utility. But this is
about how to make it work WITHOUT a pre-upgrade utility.

>> proposed an alternate patch that would have allowed space reservation,
>> and it was rejected precisely because there was no clear certainty
>> that it would solve any hypothetical future problem.
>
> True.  It was solving a problem we didn't have, yet.

Well, that's sort of a circular argument. If you're going to reserve
space with a pre-upgrade utility, you're going to need to put the
pre-upgrade utility into the version you want to upgrade FROM. If we
wanted to be able to use a pre-upgrade utility to upgrade to 8.5, we
would have had to put the utility into 8.4.

The problem I'm referring to is that there is no guarantee that you
would be able predict how much space to reserve. In a case like CRCs,
it may be as simple as "4 bytes". But what if, say, we switch to a
different compression algorithm for inline toast? Some pages will
contract, others will expand, but there's no saying by how much - and
therefore no fixed amount of reserved space is guaranteed to be
adequate. It's true that we might never want to do that particular
thing, but I don't think we can say categorically that we'll NEVER
want to do anything that expands pages by an unpredictable amount. So
it might be quite complex to figure out how much space to reserve on
any given page. If we can find a way to make that the NEW PG
version's problem, it's still complicated, but at least it's not
complicated stuff that has to be backpatched.

Another problem with a pre-upgrade utility is - how do you verify,
when you fire up the new cluster, that the pre-upgrade utility has
done its thing? If the new PG version requires 4 bytes of space
reserved on each page, what happens when you get halfway through
upgrading your 1TB database and find a page with only 2 bytes
available? There aren't a lot of good options. The old PG version
could try to mark the DB in some way to indicate whether it
successfully completed, but what if there's a bug and something was
missed? Then you have this scenario:

1. Run the pre-upgrade script.
2. pg_migrator.
3. Fire up new version.
4. Discover that pre-upgrade script forgot to reserve enough space on some page.
5. Report a bug.
6. Bug fixed, new version of pre-upgrade script is now available.
7. ???

If all the logic is in the new server, you may still be in hot water
when you discover that it can't deal with a particular case. But
hopefully the problem would be confined to that page, or that
relation, and you could use the rest of your database. And even if
not, when the bug is fixed, you are patching the version that you're
still running and not the version that you've already left behind and
can't easily go back to. Of course if the bug is bad enough it can
fry your database under any design, but in the pre-upgrade script
design you have to be really, really confident that the pre-upgrade
script doesn't have any holes that will only be found after it's too
late.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-12-02 15:48:59 Re: YAML Was: CommitFest status/management
Previous Message Laurent Laborde 2009-12-02 15:32:44 Re: Cost of sort/order by not estimated by the query planner