Re: monitoring CREATE INDEX [CONCURRENTLY]

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: monitoring CREATE INDEX [CONCURRENTLY]
Date: 2019-03-11 12:41:23
Message-ID: 20190311124123.GA7486@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Rahila

On 2019-Mar-11, Rahila Syed wrote:

> On Tue, 5 Mar 2019 at 08:32, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> wrote:

> > +extern char *btbuildphasename(int64 phasenum);
>
> 1. I think int64 is too large a datatype for phasenum.
> Also int32 is used for phasenum in pg_indexam_progress_phasename().
> Can we have it as int8?

It does look strange, I agree, and the first code I wrote had it using a
smaller type. However, I later realized that since the value comes
directly from pg_stat_get_progress_info(), which returns int8 values, it
was pointless to only accept a small fraction of the possible values for
no good reason, so I widened it to int64 as you see now.

> 2.
> . In validate_index_heapscan, we dont calculate blocks_done similar to
> IndexBuildHeapScan i.e
> block_done += scan->rs_cblock - previous_blkno which IMO is more accurate.
> Reporting scan->rs_cblock as blocks_done might give slightly inaccurate
> results as we are
> still processing that block.

Thanks for pointing out that there's an off-by-one bug there (should be
cblock-1). However, IndexBuildHeapScan uses more complicated code
because it needs to cover for two additional things that
validate_index_heapscan doesn't: parallel heapscans and synchronized
seqscans. We could do that, I just saw no point in it.

> 3. There is no if(progress) check in validate_index()/
> validate_index_heapscan() code. Wont it be a problem if it is called from
> other index methods which dont support reporting progress at the
> moment?

Good question. I'll have a look. Most likely, I'll end up having
things so that building an index using an unsupported index AM reports
progress based on IndexBuildHeapScan / validate_index /
validate_index_heapscan ... which might mean I should remove the
'progress' parameter from there and have them report unconditionally.

> 4. Just to clarify my understanding can you please see below comment
>
> Quoting your following comment in cluster command progress monitor thread
> while referring to progress reporting from IndexBuildHeapScan,
>
> "One, err, small issue with that idea is that we need the param numbers
> not to conflict for any "progress update providers" that are to be used
> simultaneously by any command."
>
> Does that mean that we can't have any other INDEX progress monitoring, use
> PROGRESS_SCAN_BLOCKS_TOTAL and PROGRESS_SCAN_BLOCKS_DONE
> parameter numbers to report anything but the metrics they report now.

What I mean is that the literal parameter numbers defined as
PROGRESS_SCAN_BLOCKS_DONE/TOTAL may not be used for other parameters by
commands that call IndexBuildHeapScan, if those other parameters are
used by the same commands simultaneously with IndexBuildHeapScan. So
those parameter numbers become "reserved".

> 5.
>
> > 15:56:44.682 | building index (3 of 8): initializing (1/5) |
> > 442478 | 442399 | 0 | 0 | 0 |
>
> I wonder how is the phase 'building index(3 of 8): initializing(1/5)' when
> the blocks_done count is increasing. Shouldn't it have
> changed to reflect PROGRESS_BTREE_PHASE_INDEXBUILD_HEAPSCAN as building
> index(3 of 8): table scan(2/5) ?
> Although I think it has been rectified in the latest patch as I now get
> 'table scan' phase in output as I do CREATE INDEX on table with 1000000
> records

Yeah, this was a bug that I fixed in v5. (It was a misunderstanding
about how parallel scanning is set up, IIRC). For v5, I tested both
parallel and non-parallel builds, with and without sync seqscans, and
everything seemed to behave correctly.

Thanks for looking! I intend to post a new version later this week.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2019-03-11 12:57:21 Re: Should we increase the default vacuum_cost_limit?
Previous Message Dean Rasheed 2019-03-11 11:58:17 Re: [Patch] Log10 and hyperbolic functions for SQL:2016 compliance