Re: [PROPOSAL] VACUUM Progress Checker.

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, <pokurev(at)pm(dot)nttdata(dot)co(dot)jp>, "Robert Haas (robertmhaas(at)gmail(dot)com)" <robertmhaas(at)gmail(dot)com>
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2015-11-19 15:57:46
Message-ID: 564DF17A.4050705@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/19/15 1:18 AM, Amit Langote wrote:
> 1) General purpose interface for (maintenance?) commands to report a set

I'm surprised no one has picked up on using this for DML. Certainly
anyone works with ETL processes would love to be able to get some clue
on the status of a long running query...

> About pass 2, ie, lazy_index_vacuum() and
> lazy_vacuum_heap(), I don't see how we can do better than reporting its
> progress only after finishing all of it without any finer-grained
> instrumentation. They are essentially block-box as far as the proposed
> instrumentation approach is concerned. Being able to report progress per
> index seems good but as a whole, a user would have to wait arbitrarily
> long before numbers move forward. We might as well just report a bool
> saying we're about to enter a potentially time-consuming index vacuum
> round with possibly multiple indexes followed by lazy_vacuum_heap()
> processing. Additionally, we can report the incremented count of the
> vacuuming round (pass 2) once we are through it.

Another option is to provide the means for the index scan routines to
report their progress. Maybe every index AM won't use it, but it'd
certainly be a lot better than staring at a long_running boolean.

> Note that we can leave them out of
> percent_done of overall vacuum progress. Until we have a good solution for
> number (3) above, it seems to difficult to incorporate index pages into
> overall progress.

IMHO we need to either put a big caution sign on any % estimate that it
could be wildly off, or just forgo it completely for now. I'll bet that
if we don't provide it some enterprising users will figure out the best
way to do this (similar to how the bloat estimate query has evolved over
time).

Even if we never get a % done indicator, just being able to see what
'position' a command is at will be very valuable.

> As someone pointed out upthread, the final heap truncate phase can take
> arbitrarily long and is outside the scope of lazy_scan_heap() to
> instrument. Perhaps a bool, say, waiting_heap_trunc could be reported for
> the same. Note that, it would have to be reported from lazy_vacuum_rel().

ISTM this is similar to the problem of reporting index status, namely
that a progress reporting method needs to accept reports from multiple
places in the code.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Big Mike 2015-11-19 15:58:35 Re: Foreign Data Wrapper
Previous Message Tom Lane 2015-11-19 15:57:27 Re: [PROPOSAL] TAP test example