Re: Autovacuum on partitioned table (autoanalyze)

From: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "Andres Freund" <andres(at)anarazel(dot)de>
Cc: "Kyotaro HORIGUCHI" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, yuzuko <yuzukohosoya(at)gmail(dot)com>, "Tomas Vondra" <tomas(dot)vondra(at)enterprisedb(dot)com>, "David Steele" <david(at)pgmasters(dot)net>, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com>, "Justin Pryzby" <pryzby(at)telsasoft(dot)com>, "Daniel Gustafsson" <daniel(at)yesql(dot)se>, "Amit Langote" <amitlangote09(at)gmail(dot)com>, "Masahiko Sawada" <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "Laurenz Albe" <laurenz(dot)albe(at)cybertec(dot)at>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Greg Stark" <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2021-08-13 18:50:16
Message-ID: 0794d7ca-5183-486b-9c5e-6d434867cecd@www.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is a proposal for 14. This patch has four main changes:

* The mod counts are only propagated to the topmost parent, not to each ancestor. This means that we'll only analyze the topmost partitioned table and not each intermediate partitioned table; seems a good compromise to avoid sampling all partitions multiple times per round.

* One pgstat message is sent containing many partition/parent pairs, not just one. This reduces the number of messages sent. 123 partitions fit in one message (messages are 1000 bytes). This is done once per autovacuum worker run, so it shouldn't be too bad.

* There's a sleep between sending the message and re-reading stats. It would be great to have a mechanism by which pgstat collector says "I've received and processed up to this point", but we don't have that; what we can do is sleep PGSTAT_STAT_INTERVAL and then reread the file, so we're certain that the file we read is at least as new as that time. This is far longer than it takes to process the messages. Note that if the messages do take longer than that to be processed by the collector, it's not a big loss anyway; those tables will be processed by the next autovacuum run.

* I changed vacuum_expand_rel to put the main-rel OID at the end. (This is a variation of Horiguchi-san proposed patch; instead of making the complete list be in the opposite order, it's just that one OID that appears at the other end). This has the same effect as his patch: any error reports thrown by vacuum/analyze mention the first partition rather than the main table. This part is in 0002 and I'm not totally convinced it's a sane idea.

Minor changes:
* I reduced autovacuum from three passes over pg_class to two passes, per your observation that we can acquire toast association together with processing partitions, and then use that in the second pass to collect everything.

* I moved the catalog-accessing code to partition.c, so we don't need to have pgstat.c doing it.

Some doc changes are pending, and some more commentary in parts of the code, but I think this is much more sensible. I do lament the lack of a syscache for pg_inherits.

Attachment Content-Type Size
0001-Propagate-counts-up-only-to-topmost-ancestor.patch text/x-patch 19.0 KB
0002-Have-expand_vacuum_rel-put-the-parent-table-last.patch text/x-patch 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Zhang 2021-08-13 23:33:22 Re: [UNVERIFIED SENDER] Re: [bug] Logical Decoding of relation rewrite with toast does not reset toast_hash
Previous Message David G. Johnston 2021-08-13 18:47:45 Re: Default to TIMESTAMP WITH TIME ZONE?