Re: Proposal: Multiversion page api (inplace upgrade)

From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Multiversion page api (inplace upgrade)
Date: 2008-06-11 16:02:22
Message-ID: 484FF70E.9060407@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas napsal(a):
> Tom Lane wrote:
>> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>>> (this won't come as a surprise as we talked about this in PGCon, but)
>>> I think we should rather convert the page structure to new format in
>>> ReadBuffer the first time a page is read in. That would keep the
>>> changes a lot more isolated.
>>
>> The problem is that ReadBuffer is an extremely low-level environment,
>> and it's not clear that it's possible (let alone practical) to do a
>> conversion at that level in every case.
>
> Well, we can't predict the future, and can't guarantee that it's
> possible or practical to do the things we need to do in the future no
> matter what approach we choose.
>
>> In particular it hardly seems
>> sane to expect ReadBuffer to do tuple content conversion, which is going
>> to be practically impossible to perform without any catalog accesses.
>
> ReadBuffer has access to Relation, which has information about what kind
> of a relation it's dealing with, and TupleDesc. That should get us
> pretty far. It would be a modularity violation, for sure, but I could
> live with that for the purpose of page version conversion.

But if you look for example into hash implementation some pages are not in
regular format and conversion could need more information which we do not have
to have in ReadBuffer.

>> Another issue is that it might not be possible to update a page for
>> lack of space. Are we prepared to assume that there will never be a
>> transformation we need to apply that makes the data bigger?
>
> We do need some solution to that. One idea is to run a pre-upgrade
> script in the old version that scans the database and moves tuples that
> would no longer fit on their pages in the new version. This could be run
> before the upgrade, while the old database is still running, so it would
> be acceptable for that to take some time.

It could not work for indexes and do not forget TOAST chunks. I think in some
cases you can get unused quoter of each page in TOAST table.

> No doubt people would prefer something better than that. Another idea
> would be to have some over-sized buffers that can be used as the target
> of conversion, until some tuples are moved off to another page. Perhaps
> the over-sized buffer wouldn't need to be in shared memory, if they're
> read-only until some tuples are moved.

Anyway, you need mechanism how to mark that this page is read only which is also
require a lot of modification. And some mechanism how to make a decision when
this page converted. I guess this approach will require similar modification as
convert on write.

> This is pretty hand-wavy, I know. The point is, I don't think these
> problems are insurmountable.
>
>> (Likely counterexample: adding collation info to text values.)
>
> I doubt it, as collation is not a property of text values, but
> operations. But that's off-topic...

Yes, it is offtopic, however I think Tom is right :-).

Zdenek

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2008-06-11 16:26:23 Re: Overhauling GUCS
Previous Message Bruce Momjian 2008-06-11 15:45:22 Re: Overhauling GUCS