Re: [PROPOSAL] VACUUM Progress Checker.

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Syed, Rahila" <Rahila(dot)Syed(at)nttdata(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2015-08-11 09:43:27
Message-ID: CANP8+jLu7fpxSN=PwvRwmc7wFzV9VsJ0iDFs8NmKMn3miaALYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10 August 2015 at 17:50, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:

> Masahiko Sawada wrote:
>
> > This topic may have been already discussed but, why don't we use just
> > total scanned pages and total pages?
>
> Because those numbers don't extrapolate nicely. If the density of dead
> tuples is irregular across the table, such absolute numbers might be
> completely meaningless: you could scan 90% of the table without seeing
> any index scan, and then at the final 10% be hit by many index scans
> cleaning dead tuples. Thus you would see progress go up to 90% very
> quickly and then take hours to have it go to 91%. (Additionally, and a
> comparatively minor point: since you don't know how many index scans are
> going to happen, there's no way to know the total number of blocks
> scanned, unless you don't count index blocks at all, and then the
> numbers become a lie.)
>
> If you instead track number of heap pages separately from index pages,
> and indicate how many index scans have taken place, you have a chance of
> actually figuring out how many heap pages are left to scan and how many
> more index scans will occur.
>

I think this overstates the difficulty.

Autovacuum knows what % of a table needs to be cleaned - that is how it is
triggered. When a vacuum runs we should calculate how many TIDs we will
collect and therefore how many trips to the indexes we need for given
memory. We can use the VM to find out how many blocks we'll need to scan in
the table. So overall, we know how many blocks we need to scan.

I think just storing (total num blocks, scanned blocks) is sufficiently
accurate to be worth holding, rather than make it even more complex.

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-08-11 09:55:19 Re: Reducing ClogControlLock contention
Previous Message Ashutosh Bapat 2015-08-11 08:25:08 Re: Transactions involving multiple postgres foreign servers