Re: heap metapages

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: heap metapages
Date: 2012-05-25 23:42:05
Message-ID: CA+TgmoY7rUPCNSOi44p19szxP0VXWGj2YucJfi+ryDmO+Zj_rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 25, 2012 at 5:57 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> It occurred to me that having a metapage with information useful to recovery
> operations in *every segment* would be useful; it certainly seems worth the
> extra block. It then occurred to me that we've basically been stuck with 2
> places to store relation data; either at the relation level in pg_class or
> on each page. Sometimes neither one is a good fit.

AFAICS, having metadata in every segment is most only helpful for
recovering from the situation where files have become disassociated
from their filenames, i.e. database -> lost+found. From the view
point of virtually the entire server, the block number space is just a
continuous sequence that starts at 0 and counts up forever (or,
anyway, until 2^32-1). While it wouldn't be impossible to allow that
knowledge to percolate up to other parts of the server, it would
basically involve drilling a fairly arbitrary hole through an
abstraction boundary that has been intact for a very long time, and
it's not clear that there's anything magical about 1GB.
Nonwithstanding the foregoing...

> ISTM that a lot of problems we've faced in the past few years are because
> there's not a good abstraction between a (mostly) linear tuplespace and the
> physical storage that goes underneath it.

...I agree with this. I'm not sure exactly what the replacement model
would look like, but it's definitely worth some thought - e.g. perhaps
there ought to be another mapping layer between logical block numbers
and files on disk, so that we can effectively delete blocks out of the
middle of a relation without requiring any special OS support, and so
that we can multiplex many small relation forks onto a single physical
file to minimize inode consumption.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-26 00:30:12 No, pg_size_pretty(numeric) was not such a hot idea
Previous Message Tom Lane 2012-05-25 23:02:42 Re: Backends stalled in 'startup' state: index corruption