From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Kevin Grittner <kgrittn(at)mail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL |
Date: | 2013-11-27 21:33:20 |
Message-ID: | 20131127213320.GA3785@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Jan 12, 2013 at 02:14:03PM -0500, Kevin Grittner wrote:
> Amit Kapila wrote:
> > On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote:
>
> >> Surely VACUUM FULL should rebuild the visibility map, and make
> >> tuples in the new relation all-visible, no?
>
> Certainly it seems odd to me that VACUUM FULL leaves the the table
> in a less-well maintained state in terms of visibility than a
> "normal" vacuum. VACUUM FULL should not need to be followed by
> another VACUUM.
>
> > I think it cannot made all visible.
>
> I don't think all tuples in the relation are necessarily visible to
> all transactions, but the ones which are should probably be flagged
> that way.
I have developed the attached proof-of-concept patch to fix the problem
of having no visibility map after CLUSTER or VACUUM FULL. I tested with
these queries:
CREATE TABLE test(x INT PRIMARY KEY);
INSERT INTO test VALUES (1);
VACUUM FULL test; -- or CLUSTER
SELECT relfilenode FROM pg_class WHERE relname = 'test';
relfilenode
-------------
16399
Then 'ls -l data/base/16384/16399*' to see the *_vm file. I am not sure
how to test that the vm contents are valid.
This patch is fairly tricky because our CLUSTER/VACUUM FULL behavior
does not do writes through the shared buffer cache, as outlined in this
C comment block:
* We can't use the normal heap_insert function to insert into the new
* heap, because heap_insert overwrites the visibility information.
* We use a special-purpose raw_heap_insert function instead, which
* is optimized for bulk inserting a lot of tuples, knowing that we have
* exclusive access to the heap. raw_heap_insert builds new pages in
* local storage. When a page is full, or at the end of the process,
* we insert it to WAL as a single record and then write it to disk
* directly through smgr. Note, however, that any data sent to the new
* heap's TOAST table will go through the normal bufmgr.
I originally tried to do this higher up in the stack but ran into
problems because I couldn't access the new heap page so I had to do it
at the non-shared-buffer page level. I reused the lazy vacuum routines.
I need to know this is the right approach, and need to know what things
are wrong or missing.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
Attachment | Content-Type | Size |
---|---|---|
vacuum.diff | text/x-diff | 7.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2013-11-27 21:38:45 | Re: [GENERAL] pg_upgrade ?deficiency |
Previous Message | Peter Eisentraut | 2013-11-27 21:22:53 | Re: Modify the DECLARE CURSOR command tag depending on the scrollable flag |