Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Date: 2023-01-19 02:10:53
Message-ID: 20230119021053.xn7c5aczln5scen3@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-18 17:00:48 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 4:37 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I can, it should be just about trivial code-wise. A bit queasy about trying to
> > forsee the potential consequences.
>
> That's always going to be true, though.
>
> > A somewhat related issue is that pgstat_report_vacuum() sets dead_tuples to
> > what VACUUM itself observed, ignoring any concurrently reported dead
> > tuples. As far as I can tell, when vacuum takes a long time, that can lead to
> > severely under-accounting dead tuples.
>
> Did I not mention that one? There are so many that it can be hard to
> keep track! That's why I catalog them.

I don't recall you doing, but there's lot of emails and holes in my head.

> This creates an awkward but logical question, though: what if
> dead_tuples doesn't go down at all? What if VACUUM actually has to
> increase it, because VACUUM runs so slowly relative to the workload?

Sure, that can happen - but it's not made better by having wrong stats :)

> > I do think this is an argument for splitting up dead_tuples into separate
> > "components" that we track differently. I.e. tracking the number of dead
> > items, not-yet-removable rows, and the number of dead tuples reported from DML
> > statements via pgstats.
>
> Is it? Why?

We have reasonably sophisticated accounting in pgstats what newly live/dead
rows a transaction "creates". So an obvious (and wrong) idea is just decrement
reltuples by the number of tuples removed by autovacuum. But we can't do that,
because inserted/deleted tuples reported by backends can be removed by
on-access pruning and vacuumlazy doesn't know about all changes made by its
call to heap_page_prune().

But I think that if we add a
pgstat_count_heap_prune(nredirected, ndead, nunused)
around heap_page_prune() and a
pgstat_count_heap_vacuum(nunused)
in lazy_vacuum_heap_page(), we'd likely end up with a better approximation
than what vac_estimate_reltuples() does, in the "partially scanned" case.

> I'm all in favor of doing that, of course. I just don't particularly
> think that it's related to this other problem. One problem is that we
> count dead tuples incorrectly because we don't account for the fact
> that things change while VACUUM runs. The other problem is that the
> thing that is counted isn't broken down into distinct subcategories of
> things -- things are bunched together that shouldn't be.

If we only adjust the counters incrementally, as we go, we'd not update them
at the end of vacuum. I think it'd be a lot easier to only update the counters
incrementally if we split ->dead_tuples into sub-counters.

So I don't think it's entirely unrelated.

You probably could get close without splitting the counters, by just pushing
down the counting, and only counting redirected and unused during heap
pruning. But I think it's likely to be more accurate with the split counter.

> Oh wait, you were thinking of what I said before -- my "awkward but
> logical question". Is that it?

I'm not quite following? The "awkward but logical" bit is in the email I'm
just replying to, right?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-01-19 02:21:33 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Previous Message Peter Geoghegan 2023-01-19 01:54:27 Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation