Re: Autovacuum on partitioned table (autoanalyze)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: yuzuko <yuzukohosoya(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2020-04-18 05:08:15
Message-ID: 20200418050815.GE26953@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 17, 2020 at 10:09:07PM +0900, Amit Langote wrote:
> On Thu, Apr 16, 2020 at 11:19 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > I don't think that adequately allows what's needed.
...(paragraph with my typos elided)...
> > For example, say a new customer has bunch of partitioned tables which each
> > currently have only one partition (for the current month), and that's expected
> > to grow to at least 20+ partitions (2+ years of history). How does one set the
> > partitioned table's auto-analyze parameters to analyze whenever any child is
> > analyzed ? I don't think it should be needed to update it every month after
> > computing sum(child tuples).
> >
> > Possibly you could allow that behavior for some special values of the
> > threshold. Like if autovacuum_analyze_threshold=-2, then analyze the parent
> > whenever any of its children are analyzed.
> >
> > I think that use case and that need would be common, but I'd like to hear what
> > others think.
>
> Having to constantly pay attention to whether a parent's
> analyze_threshold/scale_factor is working as intended would surely be
> an annoyance, so I tend to agree that we might need more than just the
> ability to set analyze_threshold/scale_factor on parent tables.
> However, I think we can at least start with being able to do
> *something* here. :) Maybe others think that this shouldn't be
> considered committable until we figure out a good analyze threshold
> calculation formula to apply to parent tables.
>
> Considering that, how about having, say, a
> autovacuum_analyze_partition_parent_frequency, with string values
> 'default', 'partition'? -- 'default' assumes the same formula as
> regular tables, whereas with 'partition', parent is analyzed as soon
> as a partition is.

I assume you mean a reloption to be applied only to partitioned tables,

Your "partition" setting would mean that the scale/threshold values would have
no effect, which seems kind of unfortunate.

I think it should be called something else, and done differently, like maybe:
autovacuum_analyze_mode = {off,sum,max,...}

The threshold would be threshold + scale*tuples, as always, but would be
compared with f(changes) as determined by the relopt.

sum(changes) would do what you called "default", comparing the sum(changes)
across all partitions to the threshold, which is itself computed using
sum(reltuples) AS reltuples.

max(changes) would compute max(changes) compared to the threshold, and the
threshold would be computed separately for each partition's reltuples:
threshold_N = parent_threshold + parent_scale * part_N_tuples. If *any*
partition exceeds that threshold, the partition itself is analyzed. This
allows what I want for time-series. Maybe this would have an alias called
"any".

I'm not sure if there's any other useful modes, like avg(changes)? I guess we
can add them later if someone thinks of a good use case.

Also, for me, the v7 patch warns:
|src/backend/postmaster/autovacuum.c:3117:70: warning: ‘reltuples’ may be used uninitialized in this function [-Wmaybe-uninitialized]
| vacinsthresh = (float4) vac_ins_base_thresh + vac_ins_scale_factor * reltuples;
..which seems to be a false positive, but easily avoided.

This patch includes partitioned tables in pg_stat_*_tables, which is great; I
complained awhile ago that they were missing [0]. It might be useful if that
part was split out into a separate 0001 patch (?).

Thanks,
--
Justin

[0] https://www.postgresql.org/message-id/20180601221428.GU5164%40telsasoft.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-04-18 06:16:48 Re: fixing old_snapshot_threshold's time->xid mapping
Previous Message Amit Kapila 2020-04-18 04:16:03 Re: WAL usage calculation patch