Re: WIP: relation metapages

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: relation metapages
Date: 2012-06-15 00:20:04
Message-ID: CA+TgmoazFS8kvpYBCgLpcUSSC+4cC-Ezv9mbjea2ne7RHvL1SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 14, 2012 at 5:34 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 14.06.2012 18:01, Robert Haas wrote:
>> What I'm really looking for at this stage of the game is feedback on
>> the design decisions I made.  The intention here is that it should be
>> possible to read old-format heaps and indexes transparently, but that
>> when we create or rewrite a relation, we add a new-style metapage.
>
> That dodges the problem of pg_upgrade, but eventually, we will want to use
> the metapage for something important, and it can't be optional at that point
> anymore. So we will eventually need to deal with pg_upgrade somehow.

Well, the code as I've written deals with pg_upgrade just fine: you
can move your old relation files over and they still work. What's
missing at present is an efficient way to convert them to the new
format. If you don't mind the inefficiency, you can use VACUUM FULL
or CLUSTER, and you're there, but obviously we'll want something
better before we start relying on this for anything too critical. For
indexes, it should be pretty trivial to reduce the requirement from
"rewrite while holding AccessExclusiveLock" to "brief
AccessExclusiveLock". Everything except GiST already has a metapage,
so you just need to rewrite that page in the new format, which is a
SMOP. GiST requires moving the existing metapage out to a free page
(or a new page) and writing a metapage pointing to it into block 0,
which is again pretty simple. Heaps are a bit harder. We could adopt
your idea of storing a block number in the metablock; any index TIDs
that point to block 0 will be interpreted as pointing to that block
instead. Then we can basically just relocate block to any free block,
or a new one, as with GiST. Alternatively, we can read the tuples in
block 0 and reinsert them; vacuum; and then repurpose block 0. Taking
it even further, we could try to do better than "brief
AccessExclusiveLock". That might be possible too, especially for
indexes, but I'm not sure that it's necessary, and I haven't thought
through the details.

Even if we just get it down to "brief AccessExclusiveLock", I think we
might also be able to improve the experience by making autovacuum do
conversions automatically. So, if you pg_upgrade, the first
autovacuum worker that spins through the database will
ConditionalLockAcquire an AccessExclusiveLock on each relation in turn
and try to do the conversion. If it can't get the lock it'll keep
retrying until it succeeds. Odds are good that for most people this
would make the addition of the metapage completely transparent. On
the other hand, if metapages exist mostly to support operations like
making an unlogged table logged or visca versa, that's not really
necessary: we can add the metapage when someone performs a DDL
operation that requires it. There is a lot of optimization possible
on top of the basic mechanism, but how much of it makes sense to do,
and which parts, depends on exactly which of the many things we could
do with this we end up deciding to actually do. I'm trying to start
with the basics, which means getting the basic infrastructure in place
and working.

> It would be nice to have the oid of the access method in the metapage (or
> some other way to identify it).

Yeah, I was thinking about that. I think a magic number might be
preferable to an OID, and we actually already have that as the first
4-bytes of the access method metadata for all index types except GIN.
I'm thinking about trying to fix up GIN so that it adds one as well;
the trick is to do it in a way that is backward-compatible, which I
have an idea how to do but haven't tackled yet. We can add a magic
number for heaps as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-06-15 01:27:11 Re: Allow WAL information to recover corrupted pg_controldata
Previous Message Peter Eisentraut 2012-06-14 22:42:12 transforms