Re: VACUUM's ancillary tasks

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Vik Fearing <vik(at)2ndquadrant(dot)fr>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM's ancillary tasks
Date: 2016-10-01 21:47:38
Message-ID: CAEepm=3RJLbVxFTH9tyc82yR1gU_r6Q7q76xbgKAHo5p+HsmSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 2, 2016 at 9:34 AM, Vik Fearing <vik(at)2ndquadrant(dot)fr> wrote:
> On 10/01/2016 09:28 AM, Thomas Munro wrote:
>> On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik(at)2ndquadrant(dot)fr> wrote:
>>> The attached two patches scratch two itches I've been having for a
>>> while. I'm attaching them together because the second depends on the first.
>>>
>>> Both deal with the fact that [auto]vacuum has taken on more roles than
>>> its original purpose.
>>>
>>>
>>> Patch One: autovacuum insert-heavy tables
>>>
>>> If you have a table that mostly receives INSERTs, it will never get
>>> vacuumed because there are no (or few) dead rows. I have added an
>>> "inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
>>> the same way as "changes_since_analyze" does.
>>>
>>> The reason such a table needs to be vacuumed is currently twofold: the
>>> visibility map is not updated, slowing down index-only scans; and BRIN
>>> indexes are not maintained, rendering them basically useless.
>>
>> I'm aware of those two problems, but not very familiar with the
>> details. I don't feel qualified to say whether insert counting is the
>> best approach to the problem at this point. I looked into it a little
>> bit however, and had the following thoughts:
>>
>> About BRIN indexes: I couldn't find an explanation of why BRIN
>> indexes don't automatically create new summary tuples when you insert
>> a new tuple in an unsummarised page range. Is it deferred until
>> VACUUM time in order to untangle some otherwise unresolvable
>> interlocking or crash safety problem, or could that one day be done?
>> Assuming that it must be deferred for some technical reason and there
>> is no way around it, then I wonder if there is a more direct and
>> accurate way to figure out when it's necessary than counting inserts.
>> Counting inserts seems slightly bogus because you can't tell whether
>> those were inserts into an existing summarised block which is
>> self-maintaining or not. At first glance it looks a bit like
>> unsummarised ranges can only appear at the end of the table, is that
>> right? If so, couldn't you detect the number of unsummarised BRIN
>> blocks just by comparing the highest summarised BRIN block and the
>> current heap size?
>>
>> About visibility maps: How crazy would it be to estimate the number
>> of not-all-visible pages instead? It would be less work to count that
>> since it would only increase when the *first* tuple is inserted into a
>> page that is currently all visible (ie when the bit is cleared), not
>> for every tuple inserted into any page like your inserts_since_vacuum
>> counter. Another difference is that inserts_since_vacuum is reset
>> even if vacuum finds that it *can't* set the all-visible bit for a
>> given page yet because of some concurrent transaction. In that case
>> the bit is still not set but autovacuum has no reason to be triggered
>> again.
>
> Sure, I could handle each case separately, but the goal of this patch
> (as hinted at in the Subject) is to generalize all the different tasks
> we've been giving to VACUUM. The only missing piece is what the first
> patch addresses; which is insert-mostly tables never getting vacuumed.

Yeah, that makes sense. I just wanted to discuss what the ideal
launch conditions would be for those particular ancillary jobs, and
then figure out whether the difference matters. Generally, I think
changes to autovacuum heuristics need some consensus-building
discussion, especially in light of other related ideas from Jeff
Janes, and from people involved with BRIN and visibility map design,
including Simon who signed up as a reviewer. Since we're out of time
I'm going to move this to the November CF, and let's hear from them.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Gorman 2016-10-01 22:23:16 Re: PATCH: two slab-like memory allocators
Previous Message Emre Hasegeli 2016-10-01 20:39:39 Contains and is contained by operators of inet datatypes