Re: [PROPOSAL] VACUUM Progress Checker.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, "Syed, Rahila" <Rahila(dot)Syed(at)nttdata(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] VACUUM Progress Checker.
Date: 2015-08-11 05:53:59
Message-ID: CAD21AoDT=rtXgyKdA-JLtpsDX8vxB6J51f2OEyqT8LKV57a7Xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 11, 2015 at 1:50 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Masahiko Sawada wrote:
>
>> This topic may have been already discussed but, why don't we use just
>> total scanned pages and total pages?
>
> Because those numbers don't extrapolate nicely. If the density of dead
> tuples is irregular across the table, such absolute numbers might be
> completely meaningless: you could scan 90% of the table without seeing
> any index scan, and then at the final 10% be hit by many index scans
> cleaning dead tuples. Thus you would see progress go up to 90% very
> quickly and then take hours to have it go to 91%. (Additionally, and a
> comparatively minor point: since you don't know how many index scans are
> going to happen, there's no way to know the total number of blocks
> scanned, unless you don't count index blocks at all, and then the
> numbers become a lie.)
> If you instead track number of heap pages separately from index pages,
> and indicate how many index scans have taken place, you have a chance of
> actually figuring out how many heap pages are left to scan and how many
> more index scans will occur.

Thank you for your explanation!
I understood about this.

> VACUUM has 3 phases now, but since phases 2 and 3 repeat, you can have an unbounded number of phases. But that assumes that we don't count truncation as a 4th phase of VACUUM...

In case of vacuum, I think we need to track the number of scanned heap
pages at least, and the information about index scan is the additional
information.
The another idea for displaying progress is to have two kind of
information: essential information and additional information.

Essential information has one numeric data, which is stored
essentially information regarding of its processing.
Additional information has two data: text and numeric. These data is
free-style data which is stored by each backend as it like.
And these three data are output at same time.

For example, In case of vacuum, essential information is the number of
total scanned heap page.

* When lazy_scan_heap starts, the two additional data are NULL.

* When lazy_vacuum_index starts, the backend set additional data like
followings.
- "Index vacuuming" into text data which describes what we're doing
now actually.
- "50" into numeric data which describes how many index pages we scanned.

* And when lazy_vacuum_index is done, backend sets additional data NULL again.

Regards,

--
Masahiko Sawada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-08-11 05:57:46 Re: max_connections and standby server
Previous Message Tatsuo Ishii 2015-08-11 05:47:18 Re: max_connections and standby server