Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Kevin Grittner <kgrittn(at)mail(dot)com>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
Date: 2013-11-27 21:33:20
Message-ID: 20131127213320.GA3785@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 12, 2013 at 02:14:03PM -0500, Kevin Grittner wrote:
> Amit Kapila wrote:
> > On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote:
>
> >> Surely VACUUM FULL should rebuild the visibility map, and make
> >> tuples in the new relation all-visible, no?
>
> Certainly it seems odd to me that VACUUM FULL leaves the the table
> in a less-well maintained state in terms of visibility than a
> "normal" vacuum. VACUUM FULL should not need to be followed by
> another VACUUM.
>
> > I think it cannot made all visible.
>
> I don't think all tuples in the relation are necessarily visible to
> all transactions, but the ones which are should probably be flagged
> that way.

I have developed the attached proof-of-concept patch to fix the problem
of having no visibility map after CLUSTER or VACUUM FULL. I tested with
these queries:

CREATE TABLE test(x INT PRIMARY KEY);
INSERT INTO test VALUES (1);
VACUUM FULL test; -- or CLUSTER
SELECT relfilenode FROM pg_class WHERE relname = 'test';
relfilenode
-------------
16399

Then 'ls -l data/base/16384/16399*' to see the *_vm file. I am not sure
how to test that the vm contents are valid.

This patch is fairly tricky because our CLUSTER/VACUUM FULL behavior
does not do writes through the shared buffer cache, as outlined in this
C comment block:

* We can't use the normal heap_insert function to insert into the new
* heap, because heap_insert overwrites the visibility information.
* We use a special-purpose raw_heap_insert function instead, which
* is optimized for bulk inserting a lot of tuples, knowing that we have
* exclusive access to the heap. raw_heap_insert builds new pages in
* local storage. When a page is full, or at the end of the process,
* we insert it to WAL as a single record and then write it to disk
* directly through smgr. Note, however, that any data sent to the new
* heap's TOAST table will go through the normal bufmgr.

I originally tried to do this higher up in the stack but ran into
problems because I couldn't access the new heap page so I had to do it
at the non-shared-buffer page level. I reused the lazy vacuum routines.

I need to know this is the right approach, and need to know what things
are wrong or missing.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Attachment Content-Type Size
vacuum.diff text/x-diff 7.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2013-11-27 21:38:45 Re: [GENERAL] pg_upgrade ?deficiency
Previous Message Peter Eisentraut 2013-11-27 21:22:53 Re: Modify the DECLARE CURSOR command tag depending on the scrollable flag