Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Unifying VACUUM VERBOSE and log_autovacuum_min_duration output
Date: 2021-12-22 22:19:16
Message-ID: CAH2-Wzk1OD8tPuGV58HeY1V4itREUsddr59CB4x2otoZ1_3Xeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 21, 2021 at 9:46 PM Greg Stark <stark(at)mit(dot)edu> wrote:
> Or rather I think a better way to look at it is that the progress
> output for the operator should be separated from the metrics logged.
> As an operator what I want to see is some progress indicator
> ""starting table scan", "overflow at x% of table scanned, starting
> index scan", "processing index 1" "index 2"... so I can have some idea
> of how much longer the vacuum will take and see whether I need to
> raise maintenance_work_mem and by how much. I don't need to see all
> the metrics while it's running.

We have the pg_stat_progress_vacuum view for that these days, of
course. Which has the advantage of working with autovacuum and
manually-run VACUUMs in exactly the same way. I am generally opposed
to any difference between autovacuum and manual VACUUM that isn't
clearly necessary. For example, ANALYZE behaves very differently in a
VACUUM ANALYZE run on a table with a GIN index in autovacuum -- that
seems awful to me.

> 2) I don't much like the format. I want to be able to parse the output
> with awk or mtail or even just grep for relevant lines. Things like
> "index scan not needed" make it hard to parse since you don't know
> what it will look like if they are needed. I would have expected
> something like "index scans: 0" which is actually already there up
> above. I'm not clear how this line is meant to be read. Is it just
> explaining *why* the index scan was skipped? It would just be missing
> entirely if it wasn't skipped?

No, a line that looks very much like the "index scan not needed" line
will always be there. IOW there will reliably be a line that explains
whether or not any index scan took place, and why (or why not).
Whereas there won't ever be a line in VACUUM VERBOSE (as currently
implemented) that tells you about something that might have been
expected to happen, but didn't actually happen.

The same thing cannot be said for every line of the log output,
though. For example, the line about I/O timings only appears with
track_io_timing=on.

I have changed things here quite a bit in the last year. I do try to
stick to the "always show line" convention, if only for the benefit of
humans. If the line doesn't generalize to every situation, then I tend
to doubt that it merits appearing in the summary in the first place.

> Fwiw, having it be parsable is why I wouldn't want it to be multiple
> ereports. That would mean it could get interleaved with other errors
> from other backends. That would be a disaster.

That does seem relevant, but honestly I haven't made that a goal here.

Part of the problem has been with what we've actually shown. Postgres
14 was the first version to separately report on the number of LP_DEAD
line pointers in the table (or left behind in the table when we didn't
do index vacuuming). Prior to 14 we only reported dead tuples. These
seemed to be assumed to be roughly equivalent in the past, but
actually they're totally different things, with many practical
consequences:

https://www.postgresql.org/message-id/flat/CAH2-WzkkGT2Gt4XauS5eQOQi4mVvL5X49hBTtWccC8DEqeNfKA%40mail.gmail.com#b7bd96573a2ca27b023ce78b4a8c2b13

This means that we only just started showing one particular metric
that is of fundamental importance in this log output (and VACUUM
VERBOSE). We also used to show things that had very little relevance,
with slightly different (confusingly similar) metrics shown in each
variant of the instrumentation (a problem that I'm trying to
permanently avoid by unifying everything). While things have improved
a lot here recently, I don't think that things have fully settled yet
-- the output will probably change quite a bit more in Postgres 15.
That makes me a little hesitant to promise very much about making the
output parseable or stable.

That said, I don't want to make it needlessly difficult. That should be avoided.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message SATYANARAYANA NARLAPURAM 2021-12-23 00:23:27 Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Previous Message Chapman Flack 2021-12-22 21:46:40 Are datcollate/datctype always libc even under --with-icu ?