From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Eliminating VACUUM FULL WAS: remove flatfiles.c |
Date: | 2010-01-31 22:58:55 |
Message-ID: | 603c8f071001311458i767e8c04l69c38c00053287b3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jan 31, 2010 at 3:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Back in September I wrote:
>> ... The sticking point --- not only for pg_class,
>> but for shared catalogs such as pg_database --- is the lack of a
>> way to track relfilenode if it ever changes. What if we keep
>> the relfilenode of these critical tables someplace else? For
>> instance, we could have a "map" file in each database holding
>> the relfilenode of pg_class, and one in $PGDATA/global holding
>> the relfilenodes of the shared catalogs and indexes. It'd be
>> possible to update a map file atomically via the rename(2) trick.
>> Then we teach relcache or some similar place to believe the map
>> files over the contents of pg_class.
>
> Thinking about this some more, I can see one small disadvantage:
> for the relations that we use the map file for, pg_class.relfilenode
> would not be trustworthy. This would not affect most of the system
> internals (which will be looking at the relcache's copy, which would
> be kept valid by the relcache code). But it would affect user queries,
> such as for example attempts to use contrib/oid2name to identify a
> file on-disk. The main case where pg_class.relfilenode would be
> likely to be out-of-sync is for shared catalogs. We could keep it
> up to date in most cases for local catalogs, but there's no hope
> of reaching into other databases' pg_class when a shared catalog
> is relocated.
>
> What I'd suggest doing about this is:
>
> (1) Store zero in pg_class.relfilenode for those catalogs for which
> the map is used. This at least makes it obvious that the value
> you're looking at isn't valid.
>
> (2) Provide a SQL function to extract the real relfilenode of any
> specified pg_class entry. We'd have to modify oid2name and
> pg_dump to know to use the function instead of looking at the
> column.
>
> There might be some other client-side code that would be broken
> until it got taught about the function, but hopefully not much.
>
> Thoughts?
Seems reasonable to me (assuming there's no way to avoid changing the
relfilenode, which I assume is the case but don't actually know the
code well enough to say with certainty).
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2010-01-31 23:31:50 | Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to |
Previous Message | Mark Cave-Ayland | 2010-01-31 22:24:40 | Re: development setup and libdir |