Re: VACUUM's ancillary tasks

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Vik Fearing <vik(at)2ndquadrant(dot)fr>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM's ancillary tasks
Date: 2016-10-01 07:28:29
Message-ID: CAEepm=2rWqt1mhT9XwEPUdvcZN_f8qb7+CaejU2D34R-VG8b5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik(at)2ndquadrant(dot)fr> wrote:
> The attached two patches scratch two itches I've been having for a
> while. I'm attaching them together because the second depends on the first.
>
> Both deal with the fact that [auto]vacuum has taken on more roles than
> its original purpose.
>
>
> Patch One: autovacuum insert-heavy tables
>
> If you have a table that mostly receives INSERTs, it will never get
> vacuumed because there are no (or few) dead rows. I have added an
> "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
> the same way as "changes_since_analyze" does.
>
> The reason such a table needs to be vacuumed is currently twofold: the
> visibility map is not updated, slowing down index-only scans; and BRIN
> indexes are not maintained, rendering them basically useless.

I'm aware of those two problems, but not very familiar with the
details. I don't feel qualified to say whether insert counting is the
best approach to the problem at this point. I looked into it a little
bit however, and had the following thoughts:

About BRIN indexes: I couldn't find an explanation of why BRIN
indexes don't automatically create new summary tuples when you insert
a new tuple in an unsummarised page range. Is it deferred until
VACUUM time in order to untangle some otherwise unresolvable
interlocking or crash safety problem, or could that one day be done?
Assuming that it must be deferred for some technical reason and there
is no way around it, then I wonder if there is a more direct and
accurate way to figure out when it's necessary than counting inserts.
Counting inserts seems slightly bogus because you can't tell whether
those were inserts into an existing summarised block which is
self-maintaining or not. At first glance it looks a bit like
unsummarised ranges can only appear at the end of the table, is that
right? If so, couldn't you detect the number of unsummarised BRIN
blocks just by comparing the highest summarised BRIN block and the
current heap size?

About visibility maps: How crazy would it be to estimate the number
of not-all-visible pages instead? It would be less work to count that
since it would only increase when the *first* tuple is inserted into a
page that is currently all visible (ie when the bit is cleared), not
for every tuple inserted into any page like your inserts_since_vacuum
counter. Another difference is that inserts_since_vacuum is reset
even if vacuum finds that it *can't* set the all-visible bit for a
given page yet because of some concurrent transaction. In that case
the bit is still not set but autovacuum has no reason to be triggered
again.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2016-10-01 09:52:34 Re: [PATCH] Generic type subscription
Previous Message Pavel Raiskup 2016-10-01 06:20:14 [PATCH] parallel & isolated makefile for plpython