Re: Autovacuum on partitioned table (autoanalyze)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: yuzuko <yuzukohosoya(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2020-04-16 14:19:11
Message-ID: 20200416141910.GM2228@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 16, 2020 at 06:16:45PM +0900, yuzuko wrote:
> > I think it ought to be possible to configure this feature such that an
> > auto-analyze on any child partition would trigger analyze of the parent. I
> > think that would be important for maintaining accurate stats of the partition
> > key column for many cases involving RANGE-partitioned tables, which are likely
> > to rely on histogram rather than MCVs.
>
> I read your previous email and understand that it would be neccesary to analyze
> partitioned tables automatically when any of its children are analyzed. In my
> first patch, auto-analyze on partitioned tables worked like this but there were
> some comments about performance of autovacuum, especially when partitioned
> tables have a lot of children.

I reread that part. There was also confusion between autovacuum vacuum and
autovacuum analyze.

I agree that it *might* be a problem to analyze the parent every time any child
is analyzed.

But it might also be what's needed for this feature to be useful.

> The latest patch lets users set different autovacuum configuration for
> each partitioned
> tables like this,
> create table p3(i int) partition by range(i) with
> (autovacuum_analyze_scale_factor=0.0005, autovacuum_analyze_threshold=100);
> so users can configure those parameters according to partitioning strategies
> and other requirements.
>
> So I think this patch can solve problem you mentioned.

I don't think that adequately allows what's needed.

I think it out to be possible to get the "analyze parent whenever a child is
analyzed" behavior easily, without having to compute new thershold parameters
every time one adds partitions, detaches partitions, loades 10x more data into
one of the partitions, load only 10% as much data into the latest partition,
etc.

For example, say a new customer has bunch of partitioned tables which each
currently have only one partition (for the current month), and that's expected
to grow to at least 20+ partitions (2+ years of history). How does one set the
partitioned table's auto-analyze parameters to analyze whenever any child is
analyzed ? I don't think it should be needed to update it every month after
computing sum(child tuples).

Possibly you could allow that behavior for some special values of the
threshold. Like if autovacuum_analyze_threshold=-2, then analyze the parent
whenever any of its children are analyzed.

I think that use case and that need would be common, but I'd like to hear what
others think.

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hamlin, Garick L 2020-04-16 14:20:53 Re: cleaning perl code
Previous Message Tom Lane 2020-04-16 13:53:46 Re: cleaning perl code