More stats about skipped vacuums

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: More stats about skipped vacuums
Date: 2017-10-10 10:26:16
Message-ID: 20171010.192616.108347483.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.
Once in a while I am asked about table bloat. In most cases the
cause is long lasting transactions and vacuum canceling in some
cases. Whatever the case users don't have enough clues to why
they have bloated tables.

At the top of the annoyances list for users would be that they
cannot know whether autovacuum decided that a table needs vacuum
or not. I suppose that it could be shown in pg_stat_*_tables.

n_mod_since_analyze | 20000
+ vacuum_requred | true
last_vacuum | 2017-10-10 17:21:54.380805+09

If vacuum_required remains true for a certain time, it means that
vacuuming stopped halfway or someone is killing it repeatedly.
That status could be shown in the same view.

n_mod_since_analyze | 20000
+ vacuum_requred | true
last_vacuum | 2017-10-10 17:21:54.380805+09
last_autovacuum | 2017-10-10 17:21:54.380805+09
+ last_autovacuum_status | Killed by lock conflict

Where the "Killed by lock conflict" would be one of the followings.

- Completed (oldest xmin = 8023)
- May not be fully truncated (yielded at 1324 of 6447 expected)
- Truncation skipped
- Skipped by lock failure
- Killed by lock conflict

If we want more formal expression, we can show the values in the
following shape. And adding some more values could be useful.

n_mod_since_analyze | 20000
+ vacuum_requred | true
+ last_vacuum_oldest_xid | 8023
+ last_vacuum_left_to_truncate | 5123
+ last_vacuum_truncated | 387
last_vacuum | 2017-10-10 17:21:54.380805+09
last_autovacuum | 2017-10-10 17:21:54.380805+09
+ last_autovacuum_status | Killed by lock conflict
...
autovacuum_count | 128
+ incomplete_autovacuum_count | 53

# The last one might be needless..

Where the "Killed by lock conflict" is one of the followings.

- Completed
- Truncation skipped
- Partially truncated
- Skipped
- Killed by lock conflict

This seems enough to find the cause of a table bloat. The same
discussion could be applied to analyze but it might be the
another issue.

There may be a better way to indicate the vacuum soundness. Any
opinions and suggestions are welcome.

I'm going to make a patch to do the 'formal' one for the time
being.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2017-10-10 11:07:40 Re: [POC] hash partitioning
Previous Message Aleksander Alekseev 2017-10-10 10:26:13 Re: [BUGS] 10.0: Logical replication doesn't execute BEFORE UPDATE OF <columns> trigger