Re: [Proposal] More Vacuum Statistics

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Naoya Anzai <nao-anzai(at)xc(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Akio Iwaasa <aki-iwaasa(at)vt(dot)jp(dot)nec(dot)com>, "bench(dot)coffee(at)gmail(dot)com" <bench(dot)coffee(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: [Proposal] More Vacuum Statistics
Date: 2015-06-07 13:58:46
Message-ID: 55744E16.7030705@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 06/05/15 14:10, Naoya Anzai wrote:
> Thank you for quick feedback, and I'm sorry for slow response.
> All of your opinions were very helpful for me.
>
> I have confirmed Greg's Idea "Timing events".
> http://www.postgresql.org/message-id/509300F7.5000803@2ndQuadrant.com
>
> Greg said at first,
> "Parsing log files for commonly needed performance data is no fun."
> Yes, I completely agree with him.
>
> That looks a nice idea but I don't know why this idea has
> not been commited yet. Anybody knows?

Most likely lack of time, I guess.

>
> I have reworked my idea since I heard dear hacker's opinions.
>
> ====================
> pg_stat_vacuum view
> ====================
>
> I understand it is not good to simply add more counters in
> pg_stat_*_tables. For now, I'd like to suggest an extension
> which can confirm vacuum statistics like pg_stat_statements.

I don't see how you want to collect the necessary information from an
extension? pg_stat_statements get most of the stats from BufferUsage
structure, but vacuum keeps all this internal, AFAIK.

So it'd be necessary to make this somehow public - either by creating
something like BufferUsage with all the vacuum stats, or perhaps a set
of callbacks (either a single log_relation_vacuum or different callbacks
for tables and indexs).

IMHO the callbacks are a better idea - for example because it naturally
handles database-wide vacuum. The global structure makes this difficult,
because you'll only see data for all the vacuumed objects (or it'd have
to track per-object stats internally, somehow).

> VACUUM is a most important feature in PostgreSQL, but a
> special view for vacuum does not exist. Don't you think
> the fact is inconvenience? At least, I am disgruntled with
> that we need to parse pg_log for tune VACUUM.

+1

> My first design of pg_stat_vacuum view is following.
> (There are two views.)
>
> pg_stat_vacuum_table
> ---------------
> dbid
> schemaname
> relid
> relname
> elapsed
> page_removed
> page_remain
> page_skipped
> tuple_removed
> tuple_remain
> tuple_notremovable
> buffer_hit
> buffer_miss
> buffer_dirty
> avg_read
> avg_write
> vm_count
> vac_start
> vac_end
> is_autovacuum
>
> pg_stat_vacuum_index
> ---------------
> dbid
> shemaname
> relid
> indexrelid
> indexname
> elapsed
> num_index_tuples
> num_pages
> tuples_removed
> pages_deleted
> pages_free
> is_autovacuum
>
> At present, I think memory design of pg_stat_statements can
> divert into this feature.And I think this module needs to
> prepare following parameters like pg_stat_statements.

I'm not really sure about this.

Firstly, the very fist response from TL in this thread was that adding
per-table counters is not a particularly good idea, as it'll bloat the
statistics files. It's true you're not adding the data into the main
stats files, but you effectively establish a new 'vertical partition'
with one record per table/index. It might be worth the overhead, if it
really brings useful functionality (especially if it's opt-in feature,
like pg_stat_statements).

Secondly, the main issue of this design IMHO is that it only tracks the
very last vacuum run (or do I understand it wrong?). That means even if
you snapshot the pg_stat_vacuum views, you'll not know how many vacuums
executed in between (and the more frequently you snapshot that, the
greater the overhead). The other stats counters have the same issue, but
the snapshotting works a bit better because the counters are cumulative
(so you can easily do deltas etc.). But that's not the case here -
certainly not with the timestamps, for example.

I don't think the vacuum start/end timestamps are particularly
interesting, TBH - we already have them in pg_stat_all_tables anyway,
including the vacuum_count etc. So I'd propose dropping the timestamps,
possibly replacing them with a single 'elapsed time', and making all the
counters cumulative (so that you can do snapshots and deltas).

I'm also wondering whether this should track the vacuum costs (because
that determines how aggressive the vacuum is, and how much work will be
done in a particular time), if it was anti-wraparound vacuum, if there
was also ANALYZE performed, if the autovacuum was interrupted because of
user activity, etc.

> pg_stat_vacuum.max(integer)
> pg_stat_vacuum.save(boolean)
> pg_stat_vacuum.excluded_dbnames(text)
> pg_stat_vacuum.excluded_schemas(text)
> pg_stat_vacuum.min_duration(integer)
> ... and so on.
>
> To implement this feature, I have to collect each vacuum-stats
> every lazy_vacuum_* and I need to embed a hook function point
> where needed. (probably last point of lazy_vacuum_rel).
> Do you hesitate to add the hook only for this function?

Aha! So you plan to use the callbacks.

>
> Similar feature has been already provided by pg_statsinfo package.
> But it is a full-stack package for PG-stats and it needs to
> redesign pg_log and design a repository database for introduce.
> And it is not a core-extension for PostgreSQL.
> (I don't intend to hate pg_statsinfo,
> I think this package is a very convinient tool)
>
> Everyone will be able to do more easily tuning of VACUUM.
> That's all I want.

I'm still wondering whether these stats will really make the tuning any
easier. What I do right now is looking at pg_stat_all_tables.n_deat_tup
and if it exceeds some threshold, it's a sign that vacuum may need a bit
of tuning. Sometimes it really requires tuning vacuum itself, but more
often than not it's due to something else (a large bulk delete,
autovacuum getting stuck on another table, ...). I don't see how the new
stats would make this any easier.

Can you give some examples on how the new stats might be used (and where
the current stats are insufficient)? What use cases do you imagine for
those stats?

It might help differentiate the autovacuum activity from the rest of the
system (e.g. there's a lot of I/O going on - how much of that is coming
from autovacuum workers?). This would however require a more
fine-grained reporting, because often the vacuums run for a very long
time, especially on very large tables (which is exactly the case when
this might be handy) - I just had a VACUUM that ran for 12 hours. These
jobs should report the stats incrementally, not just once at the very
end, because that makes it rather useless IMNSHO.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2015-06-07 14:10:39 Re: [CORE] Restore-reliability mode
Previous Message Simon Riggs 2015-06-07 09:32:00 Re: Reducing tuple overhead