Re: [PROPOSAL] VACUUM Progress Checker.

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2015-07-22 12:24:21
Message-ID: CANP8+jLLLng=cc+jOcGzFPVy9H6Pg5Zv5s0eCM53obLb6GiPwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22 July 2015 at 13:00, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Jul 22, 2015 at 3:02 AM, Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> > For me, the user workflow looks like these....
> >
> > Worried: "Task X is taking ages? When is it expected to finish?"
> > Ops: 13:50
> > <sometime later, about 14:00>
> > Worried: "Task X is still running? But I thought its ETA was 13:50?"
> > Ops: Now says 14:30
> > Worried: "Is it stuck, or is it making progress?"
> > Ops: Looks like its making progress
> > Worried: "Can we have a look at it and find out what its doing?"
>
> How does Ops know that it is making progress? Just because the
> completion percentage is changing?
>

You could, but that is not the way I suggested.

We need
* Some measure of actual progress (the definition of which may vary from
action to action, e.g. blocks scanned)
* Some estimate of the total work required
* An estimate of the estimated time of completion - I liked your view that
this prediction may be costly to request

> > In terms of VACUUM specifically: VACUUM should be able to assess
> beforehand
> > whether it will scan the indexes, or it can just assume that it will
> need to
> > scan the indexes. Perhaps VACUUM can pre-scan the VM to decide how big a
> > task it has before it starts.
>
> Well, we can assume that it will scan the indexes exactly once, but
> the actual number may be more or less; and the cost of rescanning the
> heap in phase 2 is also hard to estimate.
>
> Maybe I'm worrying over nothing, but I have a feeling that if we try
> to do what you're proposing here, we're gonna end up with this:
>
> https://xkcd.com/612/
>
> Most of the progress estimators I have seen over the ~30 years that
> I've been playing with computers have been unreliable, and many of
> those have been unreliable to the point of being annoying. I think
> that's likely to happen with what you are proposing too, though of
> course like all predictions of the future it could turn out to be
> wrong.

Almost like an Optimizer then. Important, often annoyingly wrong, needs
more work.

I'm not proposing this feature, I'm merely asking for it to be defined in a
way that makes it work for more than just VACUUM. Once we have a way of
reporting useful information, other processes can be made to follow that
mechanism, like REINDEX, ALTER TABLE etc.. I believe those things are
important, even if we never get such information for user queries. But I
hope we do.

I won't get in the way of your search for detailed information in more
complex forms. Both things are needed.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2015-07-22 13:52:52 stringify MAKE_SQLSTATE()
Previous Message Kouhei Kaigai 2015-07-22 12:13:44 Re: fdw_scan_tlist for foreign table scans breaks EPQ testing, doesn't it?