Re: New vacuum option to do only freezing

From: "Bossart, Nathan" <bossartn(at)amazon(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New vacuum option to do only freezing
Date: 2019-03-04 23:27:10
Message-ID: 48410154-E6C5-4C07-8122-8D04E3BCD1F8@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/28/19, 12:13 AM, "Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Attached the updated version patch. I've incorporated all review
> comments I got and have changed the number of tuples being reported as
> 'removed tuples'. With this option, tuples completely being removed is
> only tuples marked as unused during HOT-pruning, other dead tuples are
> left. So we count those tuples during HOT-pruning and reports it as
> removed tuples.

Thanks for the new patches. Beyond the reloptions discussion, most of
my feedback is wording suggestions.

+ <command>VACUUM</command> removes dead tuples and prunes HOT-updated
+ tuples chain for live tuples on table. If the table has any dead tuple
+ it removes them from both table and indexes for re-use. With this
+ option <command>VACUUM</command> doesn't completely remove dead tuples
+ and disables removing dead tuples from indexes. This is suitable for
+ avoiding transaction ID wraparound (see
+ <xref linkend="vacuum-for-wraparound"/>) but not sufficient for avoiding
+ index bloat. This option is ignored if the table doesn't have index.
+ This cannot be used in conjunction with <literal>FULL</literal>
+ option.

There are a couple of small changes I would make. How does something
like this sound?

VACUUM removes dead tuples and prunes HOT-updated tuple chains for
live tuples on the table. If the table has any dead tuples, it
removes them from both the table and its indexes and marks the
corresponding line pointers as available for re-use. With this
option, VACUUM still removes dead tuples from the table, but it
does not process any indexes, and the line pointers are marked as
dead instead of available for re-use. This is suitable for
avoiding transaction ID wraparound (see Section 24.1.5) but not
sufficient for avoiding index bloat. This option is ignored if
the table does not have any indexes. This cannot be used in
conjunction with the FULL option.

- * Returns the number of tuples deleted from the page and sets
- * latestRemovedXid.
+ * Returns the number of tuples deleted from the page and set latestRemoveXid
+ * and increment nunused.

I would say something like: "Returns the number of tuples deleted from
the page, sets latestRemovedXid, and updates nunused."

+ /*
+ * hasindex = true means two-pass strategy; false means one-pass. But we
+ * always use the one-pass strategy when index vacuum is disabled.
+ */

I think the added sentence should make it more clear that hasindex
will still be true when DISABLE_INDEX_CLEANUP is used. Maybe
something like:

/*
* hasindex = true means two-pass strategy; false means one-pass
*
* If DISABLE_INDEX_CLEANUP is used, hasindex may still be true,
* but we'll always use the one-pass strategy.
*/

tups_vacuumed += heap_page_prune(onerel, buf, OldestXmin, false,
- &vacrelstats->latestRemovedXid);
+ &vacrelstats->latestRemovedXid,
+ &tups_pruned);

Why do we need a separate tups_pruned argument in heap_page_prune()?
Could we add the result of heap_page_prune() to tups_pruned instead,
then report the total number of removed tuples as tups_vacuumed +
tups_pruned elsewhere?

+ * If there are no indexes or we skip index vacuum then we can vacuum
+ * the page right now instead of doing a second scan.

How about:

If there are no indexes or index cleanup is disabled, we can
vacuum the page right now instead of doing a second scan.

+ /*
+ * Here, we have indexes but index vacuum is disabled. We don't
+ * vacuum dead tuples on heap but forget them as we skip index
+ * vacuum. The vacrelstats->dead_tuples could have tuples which
+ * became dead after checked at HOT-pruning time but aren't marked
+ * as dead yet. We don't process them because it's a very rare
+ * condition and the next vacuum will process them.
+ */

I would suggest a few small changes:

/*
* Here, we have indexes but index vacuum is disabled. Instead of
* vacuuming the dead tuples on the heap, we just forget them.
*
* Note that vacrelstats->dead_tuples could include tuples which
* became dead after HOT-pruning but are not marked dead yet. We
* do not process them because this is a very rare condition, and
* the next vacuum will process them anyway.
*/

- /* If no indexes, make log report that lazy_vacuum_heap would've made */
+ /*
+ * If no index or disables index vacuum, make log report that lazy_vacuum_heap
+ * would've made. If index vacuum is disabled, we didn't remove all dead
+ * tuples but did for tuples removed by HOT-pruning.
+ */
if (vacuumed_pages)
ereport(elevel,
(errmsg("\"%s\": removed %.0f row versions in %u pages",
RelationGetRelationName(onerel),
- tups_vacuumed, vacuumed_pages)));
+ skip_index_vacuum ? tups_pruned : tups_vacuumed,
+ vacuumed_pages)));

How about:

/*
* If no index or index vacuum is disabled, make log report that
* lazy_vacuum_heap would've made. If index vacuum is disabled,
* we don't include the tuples that we marked dead, but we do
* include tuples removed by HOT-pruning.
*/

Another interesting thing I noticed is that this "removed X row
versions" message is only emitted if vacuumed_pages is greater than 0.
However, if we only did HOT pruning, tups_vacuumed will be greater
than 0 while vacuumed_pages will still be 0, so some information will
be left out. I think this is already the case, though, so this could
probably be handled in a separate thread.

Nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-03-04 23:27:46 Re: jsonpath
Previous Message Thomas Munro 2019-03-04 23:13:59 Re: Rare SSL failures on eelpout