From: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, pokurev(at)pm(dot)nttdata(dot)co(dot)jp, "Robert Haas (robertmhaas(at)gmail(dot)com)" <robertmhaas(at)gmail(dot)com> |
Subject: | Re: [PROPOSAL] VACUUM Progress Checker. |
Date: | 2015-11-20 01:29:52 |
Message-ID: | 564E7790.4000403@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015/11/20 0:57, Jim Nasby wrote:
> On 11/19/15 1:18 AM, Amit Langote wrote:
>> 1) General purpose interface for (maintenance?) commands to report a set
>
> I'm surprised no one has picked up on using this for DML. Certainly anyone
> works with ETL processes would love to be able to get some clue on the
> status of a long running query...
Instrumenting query execution for progress info would be a complex beast
though. Also, what kind of reporting interface it would require is also
not clear, at least to me. Jan Urbanski's PGCon presentation[1] is a good
source on the matter I discovered in this thread, thanks! But IMHO, for
now, it would be worthwhile to focus our resources on the modest goal of
implementing a reporting interface for utility commands. Sure it would be
nice to investigate how much the requirements of the two overlap.
>
>> About pass 2, ie, lazy_index_vacuum() and
>> lazy_vacuum_heap(), I don't see how we can do better than reporting its
>> progress only after finishing all of it without any finer-grained
>> instrumentation. They are essentially block-box as far as the proposed
>> instrumentation approach is concerned. Being able to report progress per
>> index seems good but as a whole, a user would have to wait arbitrarily
>> long before numbers move forward. We might as well just report a bool
>> saying we're about to enter a potentially time-consuming index vacuum
>> round with possibly multiple indexes followed by lazy_vacuum_heap()
>> processing. Additionally, we can report the incremented count of the
>> vacuuming round (pass 2) once we are through it.
>
> Another option is to provide the means for the index scan routines to
> report their progress. Maybe every index AM won't use it, but it'd
> certainly be a lot better than staring at a long_running boolean.
The boolean would be a workaround for sure. I'm also slightly tempted by
the idea of instrumenting vacuum scans of individual index AM's bulkdelete
methods. One precedent is how vacuum_delay_point() are sprinkled around in
the code. Another problem to solve would be to figure out how to pass
progress parameters around - via some struct or could they be globals just
like VacuumCost* variables are...
>
>> Note that we can leave them out of
>> percent_done of overall vacuum progress. Until we have a good solution for
>> number (3) above, it seems to difficult to incorporate index pages into
>> overall progress.
>
> IMHO we need to either put a big caution sign on any % estimate that it
> could be wildly off, or just forgo it completely for now. I'll bet that if
> we don't provide it some enterprising users will figure out the best way
> to do this (similar to how the bloat estimate query has evolved over time).
>
> Even if we never get a % done indicator, just being able to see what
> 'position' a command is at will be very valuable.
Agreed. If we provide enough information in whatever view we choose to
expose, that would be a good start.
>
>> As someone pointed out upthread, the final heap truncate phase can take
>> arbitrarily long and is outside the scope of lazy_scan_heap() to
>> instrument. Perhaps a bool, say, waiting_heap_trunc could be reported for
>> the same. Note that, it would have to be reported from lazy_vacuum_rel().
>
> ISTM this is similar to the problem of reporting index status, namely that
> a progress reporting method needs to accept reports from multiple places
> in the code.
Yes.
Thanks,
Amit
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Stark | 2015-11-20 01:32:41 | Re: Using quicksort for every external sort run |
Previous Message | Amit Langote | 2015-11-20 01:25:20 | Re: Typo in file header comment of replorigindesc.c |