Re: new autovacuum criterion for visible pages

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new autovacuum criterion for visible pages
Date: 2016-08-11 23:28:17
Message-ID: CAB7nPqTs_5tW7hBA6tU8PqSEH6tP51aHxoOsjNhZcNcNO9QHjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 11, 2016 at 4:21 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Aug 11, 2016 at 3:29 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Aug 11, 2016 at 5:39 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>> I wanted to create a new relopt named something like
>>> autovacuum_vacuum_pagevisible_factor which would cause autovacuum to
>>> vacuum a table once less than a certain fraction of the relation's
>>> pages are marked allvisible.
>>
>> Interesting idea.
>>
>>> 1) One issue is that pg_class.relpages and pg_class.relallvisible are
>>> themselves only updated by vacuum/analyze. In the absence of manual
>>> vacuum or analyze, this means that if the new criterion uses those
>>> field, it could only kick in after an autoanalyze has already been
>>> done, which means that autovacuum_vacuum_pagevisible_factor could not
>>> meaningfully be set lower than autovacuum_analyze_scale_factor.
>>>
>>> Should relallvisible be moved/copied from pg_class to
>>> pg_stat_all_tables, so that it is maintained by the stats collector?
>>> Or should the autovacuum worker just walk the vm of every table with a
>>> defined autovacuum_vacuum_pagevisible_factor each time it is launched
>>> to get an up-to-date count that way?
>>
>> relation_needs_vacanalyze has access to Form_pg_class, so it is not a
>> problem to use the value of relallvisible there to decide if a
>> vacuum/analyze should be run.
>
> Doh. I missed your point. One idea perhaps would be to have an
> additional field that updates the number of pages having their VM bits
> cleared, or just decrement relallvisible when that happens, and use
> that in relation_needs_vacanalyze to do the decision-making. But that
> would require updating stats each time there is a VM cleared in heap
> operations, which would be really costly...
>
> The optimizer does not depend directly on pgstat when fetching the
> estimation information it needs, so it may be wiser to not add this
> dependency, and one can disable pgstat_track_counts so moving this
> information out of pg_class is not a good idea.

With a somewhat fresher mind...

The main issue regarding this proposal can be summarized as that: as
track_counts can be disabled by users so moving relallvisible into
pgstat cannot be done except if I am missing something. The VM bits
cleared need to be tracked either by decrementing
pg_class.relallvisible, with a different counter in pg_class, or with
a completely different mechanism. Still I am scared of overall
performance impact because as the VM bit clearing is quite spread so
pg_class or the new relation where this is tracked would become really
bloated.

In short, autovacuum will need to scan by itself the VM of each
relation and decide based on that. I would not expect much performance
impact, but disabling that by default would have no impact on existing
deployments.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2016-08-11 23:45:29 Re: [Patch] New psql prompt substitution %r (m = master, r = replica)
Previous Message Peter Geoghegan 2016-08-11 22:37:56 Re: condition variables