Re: VACUUM (DISABLE_PAGE_SKIPPING on)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: VACUUM (DISABLE_PAGE_SKIPPING on)
Date: 2020-11-18 10:27:54
Message-ID: CAD21AoC=e8TjSa3yhdP9Vgiu8BESVvo-j6Ma7AE+UrDYLtSF+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 17, 2020 at 8:27 PM Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Mon, 16 Nov 2020 at 22:53, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Tue, Nov 17, 2020 at 5:52 AM Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> > I don't think the doc is wrong. If DISABLE_PAGE_SKIPPING is specified,
> > we not only set aggressive = true but also skip checking visibility
> > map. For instance, see line 905 and line 963, lazy_scan_heap().
>
> OK, so you're saying that the docs illustrate the true intention of
> the patch, which I immediately accept since I know you were the
> author. Forgive me for not discussing it with you first, I thought
> this was a clear cut case.
>
> But that then highlights another area where the docs are wrong...
>
> > On Tue, Nov 17, 2020 at 5:52 AM Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > > The docs do correctly say "Pages where all tuples are known to be
> > > frozen can always be skipped". Checking the code, lazy_scan_heap()
> > > comments say
> > > "we can still skip pages that are all-frozen, since such pages do not
> > > need freezing".
>
> The docs say this:
> "Pages where all tuples are known to be frozen can always be skipped."
> Why bother to say that if the feature then ignores that point and
> scans them anyway?
> May I submit a patch to remove that sentence?

I think the docs describe the page skipping behaviour first and then
describe what DISABLE_PAGE_SKIPPING option does. When discussing this
feature, I thought this sentence would help users to understand that
because users might be confused with that option name without
description of page skipping. So I'm not sure it's unnecessary but if
this has confused you, it would need to be somehow improved.

>
> Anyway, we're back to where I started: looking for a user-initiated
> command option that allows a table to scanned aggressively so that
> relfrozenxid can be moved forward as quickly as possible. This is what
> I thought that you were trying to achieve with DISABLE_PAGE_SKIPPING
> option, my bad.
>
> Now David J, above, says this would be VACUUM FREEZE, but I don't
> think that is right. Setting VACUUM FREEZE has these effects: 1) makes
> a vacuum aggressive, but it also 2) moves the freeze limit so high
> that it freezes mostly everything. (1) allows the vacuum to reset
> relfrozenxid, but (2) actually slows down the scan by making it freeze
> more blocks than it would do normally.
>
> So we have 3 ways to reset relfrozenxid by a user action:
> VACUUM (DISABLE_PAGE_SKIPPING ON) - scans all blocks, deliberately
> ignoring the ones it could have skipped. This certainly slows it down.
> VACUUM (FREEZE ON) - freezes everything in its path, slowing down the
> scan by writing too many blocks.
> VACUUM (FULL on) - rewrites table and rebuilds index, so very slow
>
> What I think we need is a 4th option which aims to move relfrozenxid
> forwards as quickly as possible
> * initiates an aggressive scan, so it does not skip blocks because of
> busy buffer pins
> * skip pages that are all-frozen, as we are allowed to do
> * uses normal freeze limits, so we avoid writing to blocks if possible
>

This can be done with VACUUM today by vacuum_freeze_table_age and
vacuum_multixact_freeze_table_age to 0. Adding an option for this
behavior would be good for users to understand and would work well
with the optimization.

> If we do all 3 of those things, the scan will complete as quickly as
> possible and reset relfrozenxid quickly. It would make sense to use
> that in conjunction with index_cleanup=off

Agreed.

>
> As an additional optimization, if we do find a row that needs freezing
> on a data block, we should simply freeze *all* row versions on the
> page, not just the ones below the selected cutoff. This is justified
> since writing the block is the biggest cost and it doesn't make much
> sense to leave a few rows unfrozen on a block that we are dirtying.

+1

Regards,

--
Masahiko Sawada
EnterpriseDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-11-18 10:40:24 Re: don't allocate HashAgg hash tables when running explain only
Previous Message Simon Riggs 2020-11-18 10:23:41 Re: Detecting File Damage & Inconsistencies