On Sun, Jan 31, 2010 at 3:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Back in September I wrote:
>> ... The sticking point --- not only for pg_class,
>> but for shared catalogs such as pg_database --- is the lack of a
>> way to track relfilenode if it ever changes. What if we keep
>> the relfilenode of these critical tables someplace else? For
>> instance, we could have a "map" file in each database holding
>> the relfilenode of pg_class, and one in $PGDATA/global holding
>> the relfilenodes of the shared catalogs and indexes. It'd be
>> possible to update a map file atomically via the rename(2) trick.
>> Then we teach relcache or some similar place to believe the map
>> files over the contents of pg_class.
> Thinking about this some more, I can see one small disadvantage:
> for the relations that we use the map file for, pg_class.relfilenode
> would not be trustworthy. This would not affect most of the system
> internals (which will be looking at the relcache's copy, which would
> be kept valid by the relcache code). But it would affect user queries,
> such as for example attempts to use contrib/oid2name to identify a
> file on-disk. The main case where pg_class.relfilenode would be
> likely to be out-of-sync is for shared catalogs. We could keep it
> up to date in most cases for local catalogs, but there's no hope
> of reaching into other databases' pg_class when a shared catalog
> is relocated.
> What I'd suggest doing about this is:
> (1) Store zero in pg_class.relfilenode for those catalogs for which
> the map is used. This at least makes it obvious that the value
> you're looking at isn't valid.
> (2) Provide a SQL function to extract the real relfilenode of any
> specified pg_class entry. We'd have to modify oid2name and
> pg_dump to know to use the function instead of looking at the
> There might be some other client-side code that would be broken
> until it got taught about the function, but hopefully not much.
Seems reasonable to me (assuming there's no way to avoid changing the
relfilenode, which I assume is the case but don't actually know the
code well enough to say with certainty).
In response to
pgsql-hackers by date
|Next:||From: Simon Riggs||Date: 2010-01-31 23:31:50|
|Subject: Re: Re: [COMMITTERS] pgsql: Augment WAL records for
btree delete with GetOldestXmin() to|
|Previous:||From: Mark Cave-Ayland||Date: 2010-01-31 22:24:40|
|Subject: Re: development setup and libdir|