Re: [HACKERS] More stats about skipped vacuums

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: sawada(dot)mshk(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] More stats about skipped vacuums
Date: 2017-11-18 13:23:20
Message-ID: CAB7nPqQV1Emkj=5VFzui250T6v+xcpRQ2RfHu_oQMbdXnZw3mA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 16, 2017 at 7:34 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Wed, 15 Nov 2017 16:13:01 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqQm_WCKuUf5RD0CzeMuMO907ZPKP7mBh-3t2zSJ9jn+PA(at)mail(dot)gmail(dot)com>
>> pg_stat_get_mod_since_analyze(C.oid) AS n_mod_since_analyze,
>> + pg_stat_get_vacuum_necessity(C.oid) AS vacuum_required,
>> pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
>> pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
>> pg_stat_get_last_analyze_time(C.oid) as last_analyze,
>> pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze,
>> pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
>> Please use spaces instead of tabs. Indentation is not consistent.
>
> Done. Thank you for pointing. (whitespace-mode showed me some
> similar inconsistencies at the other places in the file...)

Yes, I am aware of those which get introduced here and there. Let's
not make things worse..

>> + case PGSTAT_VACUUM_CANCELED:
>> + phase = tabentry->vacuum_last_phase;
>> + /* number of elements of phasestr above */
>> + if (phase >= 0 && phase <= 7)
>> + result = psprintf("%s while %s",
>> + status == PGSTAT_VACUUM_CANCELED ?
>> + "canceled" : "error",
>> + phasestr[phase]);
>> Such complication is not necessary. The phase parameter is updated by
>> individual calls of pgstat_progress_update_param(), so the information
>> showed here overlaps with the existing information in the "phase"
>> field.
>
> The "phase" is pg_stat_progress_vacuum's? If "complexy" means
> phasestr[phase], the "phase" cannot be overlap with
> last_vacuum_status since pg_stat_progress_vacuum's entry has
> already gone when someone looks into pg_stat_all_tables and see a
> failed vacuum status. Could you give a bit specific comment?

I mean that if you tend to report this information, you should just
use a separate column for it. Having a single column report two
informations, which are here the type of error and potentially the
moment where it appeared are harder to parse.

>> However, progress reports are here to allow users to do decisions
>> based on the activity of how things are working. This patch proposes
>> to add multiple new fields:
>> - oldest Xmin.
>> - number of index scans.
>> - number of pages truncated.
>> - number of pages that should have been truncated, but are not truncated.
>> Among all this information, as Sawada-san has already mentioned
>> upthread, the more index scans the less dead tuples you can store at
>> once, so autovacuum_work_mem ought to be increases. This is useful for
>> tuning and should be documented properly if reported to give
>> indications about vacuum behavior. The rest though, could indicate how
>> aggressive autovacuum is able to remove tail blocks and do its work.
>> But what really matters for users to decide if autovacuum should be
>> more aggressive is tracking the number of dead tuples, something which
>> is already evaluated.
>
> Hmm. I tend to agree. Such numbers are better to be shown as
> average of the last n vacuums or maximum. I decided to show
> last_vacuum_index_scan only and I think that someone can record
> it continuously to elsewhere if wants.

As a user, what would you make of those numbers? How would they help
in tuning autovacuum for a relation? We need to clear up those
questions before thinking if there are cases where those are useful.

>> Tracking the number of failed vacuum attempts is also something
>> helpful to understand how much the job is able to complete. As there
>> is already tracking vacuum jobs that have completed, it could be
>> possible, instead of logging activity when a vacuum job has failed, to
>> track the number of *begun* jobs on a relation. Then it is possible to
>> guess how many have failed by taking the difference between those that
>> completed properly. Having counters per failure types could also be a
>> possibility.
>
> Maybe pg_stat_all_tables is not the place to hold such many kinds
> of vacuum specific information. pg_stat_vacuum_all_tables or
> something like?

What do you have in mind? pg_stat_all_tables already includes counters
about the number of vacuums and analyze runs completed. I guess that
the number of failures, and the types of failures ought to be similar
counters at the same level.

>> For this commit fest, I would suggest a patch that simply adds
>> tracking for the number of index scans done, with documentation to
>> give recommendations about parameter tuning. i am switching the patch
>> as "waiting on author".
>
> Ok, the patch has been split into the following four parts. (Not
> split by function, but by the kind of information to add.)
> The first one is that.
>
> 0001. Adds pg_stat_all_tables.last_vacuum_index_scans. Documentation is added.
>
> 0002. Adds pg_stat_all_tables.vacuum_required. And primitive documentation.
>
> 0003. Adds pg_stat_all_tables.last_vacuum_status/autovacuum_fail_count
> plus primitive documentation.
>
> 0004. truncation information stuff.
>
> One concern on pg_stat_all_tables view is the number of
> predefined functions it uses. Currently 20 functions and this
> patch adds more seven. I feel it's better that at least the
> functions this patch adds are merged into one function..

For the scope of this commit fest, why not focusing only on 0001 with
the time that remains? This at least is something I am sure will be
useful.

<para>
+ Vacuuming scans all index pages to remove index entries that pointed
+ the removed tuples. In order to finish vacuuming by as few index
+ scans as possible, the removed tuples are remembered in working
+ memory. If this setting is not large enough, vacuuming runs
+ additional index scans to vacate the memory and it might cause a
+ performance problem. That behavior can be monitored
+ in <xref linkend="pg-stat-all-tables-view">.
+ </para>
Why not making that the third paragraph, after autovacuum_work_mem has
been mentioned for the first time? This could be reworded as well.
Short idea:
Vacuum scans all index pages to remove index entries that pointed to
dead tuples. Finishing vacuum with a minimal number of index scans
reduces the time it takes to complete it, and a new scan is triggered
once the in-memory storage for dead tuple pointers gets full, whose
size is defined by autovacuum_work_mem. So increasing this parameter
can make the operation finish more quickly. This can be monitored with
pg_stat_all_tables.

pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
+ pg_stat_get_last_vacuum_index_scans(C.oid) AS
last_vacuum_index_scans,
pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
Counters with counters, and last vacuum info with last vacuum info, no?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-11-18 13:36:46 Re: Speed up the removal of WAL files
Previous Message Michael Paquier 2017-11-18 11:32:55 Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256