Re: Proposal: Multiversion page api (inplace upgrade)

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 15:42:57
Message-ID: 484FF281.3030104@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>> (this won't come as a surprise as we talked about this in PGCon, but) I
>> think we should rather convert the page structure to new format in
>> ReadBuffer the first time a page is read in. That would keep the changes
>> a lot more isolated.
>
> The problem is that ReadBuffer is an extremely low-level environment,
> and it's not clear that it's possible (let alone practical) to do a
> conversion at that level in every case.

Well, we can't predict the future, and can't guarantee that it's
possible or practical to do the things we need to do in the future no
matter what approach we choose.

> In particular it hardly seems
> sane to expect ReadBuffer to do tuple content conversion, which is going
> to be practically impossible to perform without any catalog accesses.

ReadBuffer has access to Relation, which has information about what kind
of a relation it's dealing with, and TupleDesc. That should get us
pretty far. It would be a modularity violation, for sure, but I could
live with that for the purpose of page version conversion.

> Another issue is that it might not be possible to update a page for
> lack of space. Are we prepared to assume that there will never be a
> transformation we need to apply that makes the data bigger?

We do need some solution to that. One idea is to run a pre-upgrade
script in the old version that scans the database and moves tuples that
would no longer fit on their pages in the new version. This could be run
before the upgrade, while the old database is still running, so it would
be acceptable for that to take some time.

No doubt people would prefer something better than that. Another idea
would be to have some over-sized buffers that can be used as the target
of conversion, until some tuples are moved off to another page. Perhaps
the over-sized buffer wouldn't need to be in shared memory, if they're
read-only until some tuples are moved.

This is pretty hand-wavy, I know. The point is, I don't think these
problems are insurmountable.

> (Likely counterexample: adding collation info to text values.)

I doubt it, as collation is not a property of text values, but
operations. But that's off-topic...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-06-11 15:45:22 Re: Overhauling GUCS
Previous Message Zdenek Kotala 2008-06-11 15:42:54 Re: Proposal: Multiversion page api (inplace upgrade)