Re: Eliminating VACUUM FULL WAS: remove flatfiles.c

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Eliminating VACUUM FULL WAS: remove flatfiles.c
Date: 2010-01-31 22:58:55
Message-ID: 603c8f071001311458i767e8c04l69c38c00053287b3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 31, 2010 at 3:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Back in September I wrote:
>> ... The sticking point --- not only for pg_class,
>> but for shared catalogs such as pg_database --- is the lack of a
>> way to track relfilenode if it ever changes.  What if we keep
>> the relfilenode of these critical tables someplace else?  For
>> instance, we could have a "map" file in each database holding
>> the relfilenode of pg_class, and one in $PGDATA/global holding
>> the relfilenodes of the shared catalogs and indexes.  It'd be
>> possible to update a map file atomically via the rename(2) trick.
>> Then we teach relcache or some similar place to believe the map
>> files over the contents of pg_class.
>
> Thinking about this some more, I can see one small disadvantage:
> for the relations that we use the map file for, pg_class.relfilenode
> would not be trustworthy.  This would not affect most of the system
> internals (which will be looking at the relcache's copy, which would
> be kept valid by the relcache code).  But it would affect user queries,
> such as for example attempts to use contrib/oid2name to identify a
> file on-disk.  The main case where pg_class.relfilenode would be
> likely to be out-of-sync is for shared catalogs.  We could keep it
> up to date in most cases for local catalogs, but there's no hope
> of reaching into other databases' pg_class when a shared catalog
> is relocated.
>
> What I'd suggest doing about this is:
>
> (1) Store zero in pg_class.relfilenode for those catalogs for which
> the map is used.  This at least makes it obvious that the value
> you're looking at isn't valid.
>
> (2) Provide a SQL function to extract the real relfilenode of any
> specified pg_class entry.  We'd have to modify oid2name and
> pg_dump to know to use the function instead of looking at the
> column.
>
> There might be some other client-side code that would be broken
> until it got taught about the function, but hopefully not much.
>
> Thoughts?

Seems reasonable to me (assuming there's no way to avoid changing the
relfilenode, which I assume is the case but don't actually know the
code well enough to say with certainty).

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-01-31 23:31:50 Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Previous Message Mark Cave-Ayland 2010-01-31 22:24:40 Re: development setup and libdir