Quick Links

Re: [patch] CLUSTER blocks scanned progress reporting

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [patch] CLUSTER blocks scanned progress reporting
Date:	2020-11-24 15:25:03
Message-ID:	CAEze2WiepWFv+6Egv5BmmGJHDXdMY3dwRV+pjwd0mfTjsciQwQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 24 Nov 2020 at 15:05, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
> On 2020/11/21 2:32, Matthias van de Meent wrote:
> > Hi,
> >
> > The pg_stat_progress_cluster view can report incorrect
> > heap_blks_scanned values when synchronize_seqscans is enabled, because
> > it allows the sequential heap scan to not start at block 0. This can
> > result in wraparounds in the heap_blks_scanned column when the table
> > scan wraps around, and starting the next phase with heap_blks_scanned
> > != heap_blks_total. This issue was introduced with the
> > pg_stat_progress_cluster view.
>
> Good catch! I agree that this is a bug.
>
> >
> > The attached patch fixes the issue by accounting for a non-0
> > heapScan->rs_startblock and calculating the correct number with a
> > non-0 heapScan->rs_startblock in mind.
>
> Thanks for the patch! It basically looks good to me.

Thanks for the feedback!

> It's a bit waste of cycles to calculate and update the number of scanned
> blocks every cycles. So I'm inclined to change the code as follows.
> Thought?
>
> + BlockNumber prev_cblock = InvalidBlockNumber;
> <snip>
> + if (prev_cblock != heapScan->rs_cblock)
> + {
> + pgstat_progress_update_param(PROGRESS_CLUSTER_HEAP_BLKS_SCANNED,
> + (heapScan->rs_cblock +
> + heapScan->rs_nblocks -
> + heapScan->rs_startblock
> + ) % heapScan->rs_nblocks + 1);
> + prev_cblock = heapScan->rs_cblock;
> + }

That seems quite reasonable.

I noticed that with my proposed patch it is still possible to go to
the next phase while heap_blks_scanned != heap_blks_total. This can
happen when the final heap pages contain only dead tuples, so no tuple
is returned from the last heap page(s) of the scan. As the
heapScan->rs_cblock is set to InvalidBlockNumber when the scan is
finished (see heapam.c#1060-1072), I think it would be correct to set
heap_blks_scanned to heapScan->rs_nblocks at the end of the scan
instead.

Please find attached a patch applying the suggested changes.

Matthias van de Meent

Attachment	Content-Type	Size
v2-0001-Fix-CLUSTER-progress-reporting-of-number-of-block.patch	text/x-patch	2.9 KB

In response to

Re: [patch] CLUSTER blocks scanned progress reporting at 2020-11-24 14:05:47 from Fujii Masao

Responses

Re: [patch] CLUSTER blocks scanned progress reporting at 2020-11-25 09:35:41 from Fujii Masao

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David G. Johnston	2020-11-24 15:27:36	Re: abstract Unix-domain sockets
Previous Message	David G. Johnston	2020-11-24 15:20:15	Re: Terminate the idle sessions