Re: [PROPOSAL] VACUUM Progress Checker.

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2015-07-21 20:24:22
Message-ID: CA+Tgmoa4a4VcRZMjgL5wv=2S8ExYU-HMkfg=abf3ekO-q6OU7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 30, 2015 at 4:32 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Yes, I suggest just a single column on pg_stat_activity called pct_complete
>
> trace_completion_interval = 5s (default)
>
> Every interval, we report the current % complete for any operation that
> supports it. We just show NULL if the current operation has not reported
> anything or never will.

I am deeply skeptical about the usefulness of a progress-reporting
system that can only report one number. I think that, in many cases,
it won't be possible to compute an accurate completion percentage, and
even if it is, people may want more data than that for various reasons
anyway.

For example, in the case of VACUUM, suppose there is a table with
1,000,000 heap pages and 200,000 index pages (so it's probably
over-indexed, but whatever). After reading 500,000 heap pages, we
have found 0 dead tuples. What percentage of the work have we
finished?

It's hard to say. If we don't find any dead tuples, we've read half
the pages we will eventually read and are therefore half done. But if
we find even 1 dead tuple, then we've got to scan all 200,000 index
pages, so we've read only 41.7% of the pages we'll eventually touch.
If we find so many dead tuples that we have to scan the indexes
multiple times for lack of maintenance_work_mem, we'll eventually read
1,000,000 + 200,000k pages, where k is the number of index scans; if
say k = 5 then we are only 25% done. All of these scenarios are
plausible because, in all likelihood, the dirty pages in the table are
concentrated near the end.

Now we could come up with ways of making good guesses about what is
likely to happen. We could look at the data from pg_stat_all_tables,
historical results of vacuuming this table, the state of the
visibility map, and so on. And that all might help. But it's going
to be fairly hard to produce a percentage of completion that is
monotonically increasing and always accurately reflects the time
remaining. Even if we can do it, it doesn't seem like a stretch to
suppose that sometimes people will want to look at the detail data.
Instead of getting told "we're X% done" (according to some arcane
formula), it's quite reasonable to think that people will want to get
a bunch of values, e.g.:

1. For what percentage of heap pages have we completed phase one (HOT
prune + mark all visible if appropriate + freeze + remember dead
tuples)?
2. For what percentage of heap pages have we completed phase two (mark
line pointers unused)?
3. What percentage of maintenance_work_mem are we currently using to
remember tuples?

For a query, the information we want back is likely to be even more
complicated; e.g. EXPLAIN output with row counts and perhaps timings
to date for each plan node. We can insist that all status reporting
get boiled down to one number, but I suspect we would be better off
asking ourselves how we could let commands return a richer set of
data.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-07-21 20:47:00 Re: BRIN index and aborted transaction
Previous Message Peter Geoghegan 2015-07-21 20:06:43 Eliminating CREATE INDEX comparator TID tie-breaker overhead