Re: monitoring CREATE INDEX [CONCURRENTLY]

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: monitoring CREATE INDEX [CONCURRENTLY]
Date: 2019-01-09 06:46:10
Message-ID: CABOikdOmY9Fjh9syoT-=_1q0_PZH9vNbsyU3rd-ExtzE5HxtDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 1, 2019 at 6:09 AM Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> For discussion, here's an preliminary patch. This is just a first
> skeleton; needs to grow a lot of flesh yet, per my previous proposal.
> As far as the generic CREATE INDEX stuff goes, I think this is complete;
> it's missing the AM-specific bits.
>

Looks like it's missing the validate_index blocks-scanned report, which is
not AM-specific and your original proposal does mention that.

I thought a bit about index_build part. If most AMs follow a somewhat
standard phases while building an index, it might be simpler to define
those phases and have AM agnostic code report those phases. Or may be just
report the most significant information, instead of reporting each
sub-phase of index_build. I think the most important progress to know would
be how far the heap is scanned for to-be-indexed tuples. AFAICS all AMs
use IndexBuildHeapScan() to scan the heap. Can we simply do some reporting
from that routine? Like number of blocks scanned against the total number
of blocks requested?

Some minor comments on the patch, though I suspect you might be already
updating the patch since you marked it as WIP.

+CREATE VIEW pg_stat_progress_create_index AS
+ SELECT
+ s.pid AS pid, S.datid AS datid, D.datname AS datname,
+ S.relid AS relid,
+ CASE s.param1 WHEN 0 THEN 'initializing'
+ WHEN 1 THEN 'waiting for lockers 1'
+ WHEN 2 THEN 'building index'
+ WHEN 3 THEN 'waiting for lockers 2'
+ WHEN 4 THEN 'validating index'
+ WHEN 5 THEN 'waiting for lockers 3'

Can we have more descriptive text for waiters? Such as "waiting for existing
writers", "waiting for intermediate writers" and "waiting for old readers".
Not
sure if I got those correct, something of that sort will definitely give
more
insight into what the transaction is waiting for.

Can we actually also report the list of transactions the command is waiting
on?
That could be useful to the user if CIC appears to be stuck too long.

+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+ nparts);
+

IMHO we should just use full term INDEX instead of IDX, such as
PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple
of
extra characters won't make a difference. I did not see much precedence to
shortern to IDX for INDEX elsewhere in the code (though we tend to do that
for
variable names etc).

@@ -1282,6 +1318,9 @@ DefineIndex(Oid relationId,
old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
&n_old_snapshots);
+ pgstat_progress_update_param(PROGRESS_CREATEIDX_PHASE,
+ PROGRESS_CREATEIDX_PHASE_WAIT_3);
+ pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, n_old_snapshots);

I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
when the wait phase is over, to avoid any confusion. For example, I noticed
that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Shouldn't PROGRESS_WAITFOR_DONE be updated when we skip a snapshot in the
code below?
if (!VirtualTransactionIdIsValid(old_snapshots[i]))
continue; /* found uninteresting in previous cycle */

@@ -2817,7 +2818,7 @@ FastPathGetRelationLockEntry(LOCALLOCK *locallock)
* uses of the result.
*/
VirtualTransactionId *
-GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode)
+GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)

Could that out variable be named something differently? "countp" or
something
like that. I did not check if there is some practice that we follow, but I
remember suffixing with "p" rather than prefixing with "o" (for out I
assume)

+
+/* Progress parameters for CREATE INDEX */
+#define PROGRESS_CREATEIDX_PHASE 0
+/* 1 and 2 reserved for "waitfor" metrics */
+#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
+#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
+

Is there a reason to leave those reserve placeholders, only to fill them a
few
lines down?

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2019-01-09 06:51:09 unique, partitioned index fails to distinguish index key from INCLUDEd columns
Previous Message Pavel Stehule 2019-01-09 06:11:03 Re: proposal: variadic argument support for least, greatest function