vacuum verbose detail logs are unclear (show debug lines at *start* of each stage?)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: vacuum verbose detail logs are unclear (show debug lines at *start* of each stage?)
Date: 2019-12-20 17:11:32
Message-ID: 20191220171132.GB30414@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is a usability complaint. If one knows enough about vacuum and/or
logging, I'm sure there's no issue.

Right now vacuum shows:

| 1 postgres=# VACUUM t;
| 2 DEBUG: vacuuming "public.t"
| 3 DEBUG: scanned index "t_i_key" to remove 999 row versions
| 4 DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
| 5 DEBUG: "t": removed 999 row versions in 5 pages
| 6 DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
| 7 DEBUG: index "t_i_key" now contains 999 row versions in 11 pages
| 8 DETAIL: 999 index row versions were removed.
| 9 0 index pages have been deleted, 0 are currently reusable.
| 10 CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
| 11 DEBUG: "t": found 999 removable, 999 nonremovable row versions in 9 out of 9 pages
| 12 DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 130886944
| 13 There were 0 unused item identifiers.
| 14 Skipped 0 pages due to buffer pins, 0 frozen pages.
| 15 0 pages are entirely empty.
| 16 CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
| 17 VACUUM

2: line showing action to be performed on table - good
3-4: line showing action which WAS performed on index, but only after it's done
5-6: line showing actions conditionally performed on table, but only after it's done
7-10: line showing status of on index, but only after it's done
11-16: line showing status of table; unconditional; good

I'm proposing to output a message before 3, 5, and 7, like in the attached.
The messages are just placeholders; if there's any agreement this is an
improvement, I'll accept suggestions for better content.

This is confusing, at least to me. For example, the rusage output is shown
numerous times (depending on the number of indices and dead tuples). I (at
least) tend to think think that a past-tense followed by an "elapsed" time
indicates that the process is neary done, and maybe just waiting on a fsync or
maybe some other synchronization. If one sees multiple indexes output quickly,
you can infer that the process is looping over them. When it's done with the
indexes, it starts another phase, but doesn't say that (or what).

#2 "vacuuming" line shows no rusage, and it's not very clear that the rusage
"DETAIL" in line#3 (in this example) applies to line#2 "scanning index", so
it's easy to think that the output is reporting that the whole command took
0.00s elapsed, which is irritating when the command hasn't yet finished.

Another example from CSV logs, to show log times (keep in mind that VACUUM
VERBOSE is less clear than the logfile, which has the adantage of a separate
column for DETAIL).

| 1 2019-12-16 09:59:22.568+10 | vacuuming "public.alarms" |
| 2 2019-12-16 09:59:47.662+10 | scanned index "alarms_active_idx" to remove 211746 row versions | CPU: user: 0.22 s, system: 0.00 s, elapsed: 0.46 s
| 3 2019-12-16 09:59:48.036+10 | scanned index "alarms_displayable_idx" to remove 211746 row versions | CPU: user: 0.22 s, system: 0.00 s, elapsed: 0.37 s
| 4 2019-12-16 09:59:48.788+10 | scanned index "alarms_raw_current_idx" to remove 211746 row versions | CPU: user: 0.28 s, system: 0.00 s, elapsed: 0.75 s
| 5 2019-12-16 09:59:51.379+10 | scanned index "alarms_alarm_id_linkage_back_idx" to remove 211746 row versions | CPU: user: 1.04 s, system: 0.05 s, elapsed: 2.59 s
| 6 2019-12-16 09:59:53.75+10 | scanned index "alarms_alarm_id_linkage_idx" to remove 211746 row versions | CPU: user: 0.99 s, system: 0.08 s, elapsed: 2.37 s
| 7 2019-12-16 09:59:56.473+10 | scanned index "alarms_pkey" to remove 211746 row versions | CPU: user: 1.11 s, system: 0.08 s, elapsed: 2.72 s
| 8 2019-12-16 10:00:35.142+10 | scanned index "alarms_alarm_time_idx" to remove 211746 row versions | CPU: user: 0.94 s, system: 0.08 s, elapsed: 38.66 s
| 9 2019-12-16 10:00:37.002+10 | scanned index "alarms_alarm_clear_time_idx" to remove 211746 row versions | CPU: user: 0.72 s, system: 0.08 s, elapsed: 1.85 s
| 10 2019-12-16 10:03:57.42+10 | "alarms": removed 211746 row versions in 83923 pages | CPU: user: 10.24 s, system: 2.28 s, elapsed: 200.41 s
| 11 2019-12-16 10:03:57.425+10 | index "alarms_active_idx" now contains 32 row versions in 1077 pages | 57251 index row versions were removed. +
| 13 2019-12-16 10:03:57.426+10 | index "alarms_raw_current_idx" now contains 1495 row versions in 1753 pages | 96957 index row versions were removed. +
| 15 2019-12-16 10:03:57.426+10 | index "alarms_displayable_idx" now contains 32 row versions in 1129 pages | 55220 index row versions were removed. +
| 16 2019-12-16 10:03:57.427+10 | index "alarms_pkey" now contains 2269786 row versions in 9909 pages | 197172 index row versions were removed. +
| 17 2019-12-16 10:03:57.427+10 | index "alarms_alarm_time_idx" now contains 2269791 row versions in 10306 pages | 211745 index row versions were removed. +
| 17 2019-12-16 10:03:57.427+10 | index "alarms_alarm_id_linkage_idx" now contains 2269786 row versions in 11141 pages | 211746 index row versions were removed. +
| 19 2019-12-16 10:03:57.427+10 | index "alarms_alarm_id_linkage_back_idx" now contains 2269786 row versions in 11352 pages | 211746 index row versions were removed. +
| 20 2019-12-16 10:03:57.428+10 | index "alarms_alarm_clear_time_idx" now contains 2269791 row versions in 9875 pages | 166886 index row versions were removed. +
| 21 2019-12-16 10:03:57.43+10 | "alarms": found 9534 removable, 1093069 nonremovable row versions in 211956 out of 430749 pages | 1 dead row versions cannot be removed yet, oldest xmin: 133809389+
| | | There were 562588 unused item identifiers. +
| | | Skipped 0 pages due to buffer pins, 7066 frozen pages. +
| | | 0 pages are entirely empty. +
| | | CPU: user: 17.85 s, system: 5.40 s, elapsed: 274.86 s.
| 22 2019-12-16 10:03:58.795+10 | "pg_toast_17781": removed 28 row versions in 7 pages | CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
| 23 2019-12-16 10:03:58.795+10 | scanned index "pg_toast_17781_index" to remove 28 row versions | CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.07 s
| 24 2019-12-16 10:03:58.816+10 | index "pg_toast_17781_index" now contains 1503 row versions in 7 pages | 23 index row versions were removed. +
| 25 2019-12-16 10:03:58.816+10 | "pg_toast_17781": found 23 removable, 1503 nonremovable row versions in 375 out of 375 pages | 0 dead row versions cannot be removed yet, oldest xmin:

#9 shows result of action performed on index, followed by 3.5 minutes of
silence... This isn't very amusing when the last output says "elapsed: 1.85s",
and when you don't know how many "elapsed" lines to expect (as bad as any
progress bar with multiple phases).

Another approach would be to somehow make it more clear (for vacuum or in
general) that the "detail" line is associated with the preceding output.

Justin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-12-20 17:24:46 Re: Created feature for to_date() conversion using patterns 'YYYY-WW', 'YYYY-WW-D', 'YYYY-MM-W' and 'YYYY-MM-W-D'
Previous Message Robert Haas 2019-12-20 17:11:31 Re: Clarifying/rationalizing Vars' varno/varattno/varnoold/varoattno