Re: Write visibility map during CLUSTER/VACUUM FULL

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Anna Akenteva <a(dot)akenteva(at)postgrespro(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Write visibility map during CLUSTER/VACUUM FULL
Date: 2021-08-30 00:26:42
Message-ID: 20210830002642.GT26465@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 28, 2021 at 11:22:01AM +0300, Anna Akenteva wrote:
> On 2019-11-29 05:32, Michael Paquier wrote:
> > These comments are unanswered for more than 2 months, so I am marking
> > this entry as returned with feedback.
>
> I'd like to revive this patch. To make further work easier, attaching a
> rebased version of the latest patch by Alexander.
>
> As for having yet another copy of logic determining visibility: the
> visibility check was mainly taken from heap_page_is_all_visible(), so I
> refactored the code to make sure that logic is not duplicated. The updated
> patch is attached too.

I agree that it's strange that VACUUM(FREEZE) freezes tuples but not VM bits,
nor page-level PD_ALL_VISIBLE (which is implied by all frozen). After FULL
runs (taking an exclusive lock on the table), it's necessary to then vacuum the
table again to get what's intended.

Rebased on f10f0ae420ee62400876ab34dca2c09c20dcd030.

And rephrased Anna's two independent/alternate patches as a 2nd patch on top of
the 1st, as that helps me to read it and reduces its total size.

I noticed in testing the patch that autovacuum is still hitting the relation
shortly after vacuum full. This is due to n_ins_since_autovacuum, which is new
in pg13. I don't know how to address that (or even if it should be addressed
at all).

Also, pg_class.relallvisible is not set unless vacuum/analyze is run again
(which is mitigated by the n_ins behavior above). It seems like this might be
important: an plan which uses index-only scan might be missed in favour of
something else until autovacuum runs (it will use cost-based delay, and might
not be scheduled immediately, could be interrupted, or even diasbled).

I'm testing like this:
CREATE EXTENSION IF NOT EXISTS pg_visibility ; DROP TABLE t; CREATE TABLE t AS SELECT generate_series(1,99999); VACUUM FULL t; ANALYZE t; SELECT n_ins_since_vacuum, n_tup_upd, n_tup_del, n_tup_hot_upd FROM pg_stat_all_tables WHERE relname='t'; SELECT relpages, relallvisible FROM pg_class WHERE oid='t'::regclass; SELECT COUNT(1), COUNT(1)FILTER(WHERE all_visible='t') allvis, COUNT(1)FILTER(WHERE all_frozen='t') allfrz, COUNT(1)FILTER(WHERE pd_all_visible='t') allvispd FROM pg_visibility('t');

--
Justin

Attachment Content-Type Size
0001-VACUUM-FULL-CLUSTER-set-VM-and-page-level-visibility.patch text/x-diff 10.0 KB
0002-refactor-the-code-to-make-sure-that-logic-is-not-dup.patch text/x-diff 12.4 KB
0003-cluster-set-relallvisible-num_pages.patch text/x-diff 815 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-08-30 00:55:53 Re: [PATCH] Tab completion for ALTER TABLE … ADD …
Previous Message Yura Sokolov 2021-08-30 00:18:33 jff: checksum algorithm is not as intended