Re: New VACUUM FULL

From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: New VACUUM FULL
Date: 2010-01-04 02:50:56
Message-ID: 20100104115056.98C2.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> So, what is the roadmap for getting this done? It seems like to get
> rid of VFI completely, we would need to implement something like what
> Tom described here:
>
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00249.php
>
> I'm not sure whether the current patch is a good intermediate step
> towards that ultimate goal, or whether events have overtaken it.

I think the most desirable roadmap is:
1. Enable CLUSTER to non-critical system catalogs.
2. Also enable CLUSTER and REINDEX to critical system catalogs.
3. Remove VFI and re-implement VACUUM FULL with CLUSTER-based approach.
It should be also optimized as Simon's suggestion.

My patch was intended to do 3, but we should not skip 1 and 2. In the roadmap,
we don't have two versions of VACUUM FULL (INPLACE and REWRITE) at a time.

I think we can do 1 immediately. The comment in cluster says "might work",
and I also think so. CLUSTERable toast tables are obviously useful.
/*
* Disallow clustering system relations. This will definitely NOT work
* for shared relations (we have no way to update pg_class rows in other
* databases), nor for nailed-in-cache relations (the relfilenode values
* for those are hardwired, see relcache.c). It might work for other
* system relations, but I ain't gonna risk it.
*/

For 2, we need some kinds of "relfilenode mapper" for shared relations
and critical local tables (pg_class, pg_attribute, pg_proc, and pg_type).
I'm thinking that we only store "virtual" relfilenodes for them in pg_class
and remember the actual relfilenodes in shared memory. For example,
smgropen(1248:pg_database) is redirected to smgropen(mapper[1248]).
Since we cannot touch pg_class in non-login databases, we need to avoid
updating pg_class when we assign new relfilenodes for shared relations.

We also need to store the nodes in additional flat file. There might be
another approach to store them in control file for shared relation
(ControlFileData.shared_relfilenode_mapper as Oid[]), or pg_database
for local tables (pg_database.datclsssnode, datprocnode etc.)

What approach would be better?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-01-04 02:56:09 Re: Thoughts on statistics for continuously advancing columns
Previous Message Tom Lane 2010-01-04 02:44:40 pgsql: When estimating the selectivity of an inequality "column >