Re: New vacuum option to do only freezing

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New vacuum option to do only freezing
Date: 2019-03-05 09:18:03
Message-ID: CAD21AoBkbNPomNFLaJ=CSrBGBiecwhfrRqs3-fxh-7O9pTwNyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 5, 2019 at 8:27 AM Bossart, Nathan <bossartn(at)amazon(dot)com> wrote:
>
> On 2/28/19, 12:13 AM, "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > Attached the updated version patch. I've incorporated all review
> > comments I got and have changed the number of tuples being reported as
> > 'removed tuples'. With this option, tuples completely being removed is
> > only tuples marked as unused during HOT-pruning, other dead tuples are
> > left. So we count those tuples during HOT-pruning and reports it as
> > removed tuples.
>
> Thanks for the new patches. Beyond the reloptions discussion, most of
> my feedback is wording suggestions.

Thank you for the comment!

>
> + <command>VACUUM</command> removes dead tuples and prunes HOT-updated
> + tuples chain for live tuples on table. If the table has any dead tuple
> + it removes them from both table and indexes for re-use. With this
> + option <command>VACUUM</command> doesn't completely remove dead tuples
> + and disables removing dead tuples from indexes. This is suitable for
> + avoiding transaction ID wraparound (see
> + <xref linkend="vacuum-for-wraparound"/>) but not sufficient for avoiding
> + index bloat. This option is ignored if the table doesn't have index.
> + This cannot be used in conjunction with <literal>FULL</literal>
> + option.
>
> There are a couple of small changes I would make. How does something
> like this sound?
>
> VACUUM removes dead tuples and prunes HOT-updated tuple chains for
> live tuples on the table. If the table has any dead tuples, it
> removes them from both the table and its indexes and marks the
> corresponding line pointers as available for re-use. With this
> option, VACUUM still removes dead tuples from the table, but it
> does not process any indexes, and the line pointers are marked as
> dead instead of available for re-use. This is suitable for
> avoiding transaction ID wraparound (see Section 24.1.5) but not
> sufficient for avoiding index bloat. This option is ignored if
> the table does not have any indexes. This cannot be used in
> conjunction with the FULL option.

Hmm, that's good idea but I'm not sure that user knows the word 'line
pointer' because it is used only at pageinspect document. I wonder if
the word 'item identifier' would rather be appropriate here because
this is used at at describing page layout(in storage.sgml). Thought?

>
> - * Returns the number of tuples deleted from the page and sets
> - * latestRemovedXid.
> + * Returns the number of tuples deleted from the page and set latestRemoveXid
> + * and increment nunused.
>
> I would say something like: "Returns the number of tuples deleted from
> the page, sets latestRemovedXid, and updates nunused."

Fixed.

>
> + /*
> + * hasindex = true means two-pass strategy; false means one-pass. But we
> + * always use the one-pass strategy when index vacuum is disabled.
> + */
>
> I think the added sentence should make it more clear that hasindex
> will still be true when DISABLE_INDEX_CLEANUP is used. Maybe
> something like:
>
> /*
> * hasindex = true means two-pass strategy; false means one-pass
> *
> * If DISABLE_INDEX_CLEANUP is used, hasindex may still be true,
> * but we'll always use the one-pass strategy.
> */
>

Fixed.

> tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
> - &vacrelstats->latestRemovedXid);
> + &vacrelstats->latestRemovedXid,
> + &tups_pruned);
>
> Why do we need a separate tups_pruned argument in heap_page_prune()?
> Could we add the result of heap_page_prune() to tups_pruned instead,
> then report the total number of removed tuples as tups_vacuumed +
> tups_pruned elsewhere?

Hmm, I thought that we should report only the number of tuples
completely removed but we already count the tulples marked as
redirected as tups_vacuumed. Let me summarize the fate of dead tuples.
I think we can roughly classify dead tuples as follows.

1. root tuple of HOT chain that became dead
2. root tuple of HOT chain that became redirected
3. other tupels of HOT chain that became unused
4. tuples that became dead after HOT pruning

The tuples of #1 through #3 either have only ItemIDs or have been
completely removed but tuples of #4 has its tuple storage because they
are not processed when HOT-pruning.

Currently tups_vacuumed counts all of them, nleft (=
vacrelstats->num_dead_tuples) counts #1 + #4. I think that the number
of removed tuples being reported would be #1 + #2 + #3. Or should we
use #2 + #3 instead?

>
> + * If there are no indexes or we skip index vacuum then we can vacuum
> + * the page right now instead of doing a second scan.
>
> How about:
>
> If there are no indexes or index cleanup is disabled, we can
> vacuum the page right now instead of doing a second scan.

Fixed.

>
> + /*
> + * Here, we have indexes but index vacuum is disabled. We don't
> + * vacuum dead tuples on heap but forget them as we skip index
> + * vacuum. The vacrelstats->dead_tuples could have tuples which
> + * became dead after checked at HOT-pruning time but aren't marked
> + * as dead yet. We don't process them because it's a very rare
> + * condition and the next vacuum will process them.
> + */
>
> I would suggest a few small changes:
>
> /*
> * Here, we have indexes but index vacuum is disabled. Instead of
> * vacuuming the dead tuples on the heap, we just forget them.
> *
> * Note that vacrelstats->dead_tuples could include tuples which
> * became dead after HOT-pruning but are not marked dead yet. We
> * do not process them because this is a very rare condition, and
> * the next vacuum will process them anyway.
> */

Fixed.

>
> - /* If no indexes, make log report that lazy_vacuum_heap would've made */
> + /*
> + * If no index or disables index vacuum, make log report that lazy_vacuum_heap
> + * would've made. If index vacuum is disabled, we didn't remove all dead
> + * tuples but did for tuples removed by HOT-pruning.
> + */
> if (vacuumed_pages)
> ereport(elevel,
> (errmsg("\"%s\": removed %.0f row versions in %u pages",
> RelationGetRelationName(onerel),
> - tups_vacuumed, vacuumed_pages)));
> + skip_index_vacuum ? tups_pruned : tups_vacuumed,
> + vacuumed_pages)));
>
> How about:
>
> /*
> * If no index or index vacuum is disabled, make log report that
> * lazy_vacuum_heap would've made. If index vacuum is disabled,
> * we don't include the tuples that we marked dead, but we do
> * include tuples removed by HOT-pruning.
> */

Fixed.

>
> Another interesting thing I noticed is that this "removed X row
> versions" message is only emitted if vacuumed_pages is greater than 0.
> However, if we only did HOT pruning, tups_vacuumed will be greater
> than 0 while vacuumed_pages will still be 0, so some information will
> be left out. I think this is already the case, though, so this could
> probably be handled in a separate thread.
>

Hmm, since this log message is corresponding to the one that
lazy_vacuum_heap makes and total number of removed tuples are always
reported, it seems consistent to me. Do you have another point?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2019-03-05 09:18:50 Re: extension patch of CREATE OR REPLACE TRIGGER
Previous Message Matwey V. Kornilov 2019-03-05 09:16:27 Re: [PATCH 0/3] Introduce spgist quadtree @<(point,circle) operator