Re: New IndexAM API controlling index vacuum strategies

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-03 02:43:40
Message-ID: CAD21AoByN7=fg-A4AN1gasj38hrarBPEezaLZyv2Ma=zLQarzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 2, 2021 at 12:00 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Sun, Feb 21, 2021 at 10:28 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > Sorry for the late response.
>
> Me too!

No problem, thank you for your comment!

>
> > 1. Whereas skipping index vacuum and heap vacuum is a very attractive
> > improvement, if we skip that by default I wonder if we need a way to
> > disable it. Vacuum plays a role in cleaning and diagnosing tables in
> > practice. So in a case where the table is bad state and the user wants
> > to clean all heap pages, it would be good to have a way to disable
> > this skipping behavior. One solution would be that index_cleanup
> > option has three different behaviors: on, auto (or smart), and off. We
> > enable this skipping behavior by default in ‘auto’ mode, but
> > specifying "INDEX_CLEANUP true” means to enforce index vacuum and
> > therefore disabling it.
>
> Sounds reasonable to me. Maybe users should express the skipping
> behavior that they desire in terms of the *proportion* of all heap
> blocks with LP_DEAD line pointers that we're willing to have while
> still skipping index vacuuming + lazy_vacuum_heap() heap scan. In
> other words, it can be a "scale" type GUC/param (though based on heap
> blocks *at the end* of the first heap scan, not tuples at the point
> the av launcher considers launching AV workers).

A scale type parameter seems good to me but I wonder if how users can
tune that parameter. We already have tuple-based parameters such as
autovacuum_vacuum_scale_factor/threshold and I think that users
basically don't pay attention to that table updates result in how many
blocks.

And I'm concerned that my above idea could confuse users since what we
want to control is both heap vacuum and index vacuum but it looks like
controlling only index vacuum.

The third idea is a VACUUM command option like DISABLE_PAGE_SKIPPING
to disable such skipping behavior. I imagine that the
user-controllable-option to enforce both heap vacuum and index vacuum
would be required also in the future when we have the vacuum strategy
feature (i.g., incremental vacuum).

>
> > @@ -1299,6 +1303,7 @@ lazy_scan_heap(Relation onerel, VacuumParams
> > *params, LVRelStats *vacrelstats,
> > {
> > lazy_record_dead_tuple(dead_tuples, &(tuple.t_self));
> > all_visible = false;
> > + has_dead_tuples = true;
> > continue;
> > }
> >
> > I added the above change in the patch to count the number of heap
> > pages having at least one LP_DEAD line pointer. But it's weird to me
> > that we have never set has_dead_tuples true when we found an LP_DEAD
> > line pointer.
>
> I think that you're right. However, in practice it isn't harmful
> because has_dead_tuples is only used when "all_visible = true", and
> only to detect corruption (which should never happen). I think that it
> should be fixed as part of this work, though.

Agreed.

>
> Lots of stuff in this area is kind of weird already. Sometimes this is
> directly exposed to users, even. This came up recently, when I was
> working on VACUUM VERBOSE stuff. (This touched the precise piece of
> code you've patched in the quoted diff snippet, so perhaps you know
> some of the story I will tell you now already.)
>
> I recently noticed that VACUUM VERBOSE can report a very low
> tups_vacuumed/"removable heap tuples" when run against tables where
> most pruning is opportunistic pruning rather than VACUUM pruning
> (which is very common), provided there are no HOT updates (which is
> common but not very common). This can be very confusing, because
> VACUUM VERBOSE will report a "tuples_deleted" for the heap relation
> that is far far less than the "tuples_removed" it reports for indexes
> on the same table -- even though both fields have values that are
> technically accurate (e.g., not very affected by concurrent activity
> during VACUUM, nothing like that).
>
> This came to my attention when I was running BenchmarkSQL for the
> 64-bit XID deleted pages patch. One of the BenchmarkSQL tables (though
> only one -- the table whose UPDATEs are not HOT safe, which is unique
> among BenchmarkSQL/TPC-C tables). I pushed a commit with comment
> changes [1] to make that aspect of VACUUM VERBOSE a little less
> confusing. (I was actually running a quick-and-dirty hack that made
> log_autovacuum show VACUUM VERBOSE index stuff -- I would probably
> have missed the weird difference between heap tups_vacuumed and index
> tuples_removed without this custom log_autovacuum hack.)

That's true. I didn't know that.

>
> Just to be clear (I think you agree already): we should base any
> triggering logic for skipping index vacuuming/lazy_vacuum_heap() on
> logic that does not care *when* heap pages first contained LP_DEAD
> line pointers (could be that they were counted in tups_vacuumed due to
> being pruned during this VACUUM operation, could be from an earlier
> opportunistic pruning, etc).

Agreed. We should base only the fact that the page contains LP_DEAD.

>
> > Currently, we set it to false true in 'tupgone' case but
> > it seems to me that we should do that in this case as well since we
> > use this flag in the following check:
> >
> > else if (PageIsAllVisible(page) && has_dead_tuples)
> > {
> > elog(WARNING, "page containing dead tuples is marked as
> > all-visible in relation \"%s\" page %u",
> > vacrelstats->relname, blkno);
> > PageClearAllVisible(page);
> > MarkBufferDirty(buf);
> > visibilitymap_clear(onerel, blkno, vmbuffer,
> > VISIBILITYMAP_VALID_BITS);
> > }
>
> The "tupgone = true"/HEAPTUPLE_DEAD-race case is *extremely* weird. It
> has zero test coverage according to coverage.postgresql.org [2],
> despite being very complicated.
>
> 3 points on the "tupgone = true" weirdness (I'm writing this as a
> record for myself, almost):
>
> 1. It is the reason why lazy_vacuum_heap() must be prepared to set
> tuples LP_UNUSED that are not already LP_DEAD. So when
> lazy_vacuum_page() says "the first dead tuple for this page", that
> doesn't necessarily mean LP_DEAD items! (Though the other cases are
> not even tested, I think -- the lack of "tupgone = true" test coverage
> also means we don't cover corresponding lazy_vacuum_page() cases.)
>
> 2. This is also why we need XLOG_HEAP2_CLEANUP_INFO records (i.e. why
> XLOG_HEAP2_CLEAN records are not sufficient to all required recovery
> conflicts during VACUUM).
>
> 3. And it's also why log_heap_clean() is needed for both
> lazy_scan_heap()'s pruning and lazy_vacuum_heap() unused-marking.
>
> Many years ago, Noah Misch tried to clean this up -- that included
> renaming lazy_vacuum_heap() to lazy_heap_clear_dead_items(), which
> would only deal with LP_DEAD items:
>
> https://www.postgresql.org/message-id/flat/20130108024957.GA4751%40tornado.leadboat.com
>
> Of course, this effort to eliminate the "tupgone =
> true"/XLOG_HEAP2_CLEANUP_INFO special case didn't go anywhere at the
> time.

I'll look at that thread.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-03-03 02:54:59 Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Previous Message Andrew Dunstan 2021-03-03 02:20:52 Re: buildfarm windows checks / tap tests on windows