Re: Autovacuum on partitioned table

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: yuzuko <yuzukohosoya(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table
Date: 2020-01-28 11:27:15
Message-ID: CA+fd4k5396AYZo5XLM=sRvQctC+CetMRL_CdfSWqAS_mcF0XKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 28 Jan 2020 at 17:52, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> Hello,
>
> On Fri, Dec 27, 2019 at 2:02 PM Masahiko Sawada
> <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
> > On Fri, 27 Dec 2019 at 12:37, yuzuko <yuzukohosoya(at)gmail(dot)com> wrote:
> > > As Laurenz commented in this thread, I tried adding option
> > > to update parent's statistics during Autovacuum. To do that,
> > > I propose supporting 'autovacuum_enabled' option already
> > > exists on partitioned tables.
> > >
> > > In the attached patch, you can use 'autovacuum_enabled' option
> > > on partitioned table as usual, that is, a default value of this option
> > > is true. So if you don't need autovacuum on a partitioned table,
> > > you have to specify the option:
> > > CREATE TABLE p(i int) partition by range(i) with (autovacuum_enabled=0);
> > >
> > > I'm not sure but I wonder if a suitable value as a default of
> > > 'autovacuum_enabled' for partitioned tables might be false.
> > > Because autovacuum on *partitioned tables* requires scanning
> > > all children to make partitioned tables' statistics.
> > > But if the default value varies according to the relation,
> > > is it confusing? Any thoughts?
> >
> > I don't look at the patch deeply yet but your patch seems to attempt
> > to vacuum on partitioned table. IIUC partitioned tables don't need to
> > be vacuumed and its all child tables are vacuumed instead if we pass
> > the partitioned table to vacuum() function. But autovacuum on child
> > tables is normally triggered since their statistics are updated.
> >
> > I think it's a good idea to have that option but I think that doing
> > autovacuum on the parent table every time when autovacuum is triggered
> > on one of its child tables is very high cost especially when there are
> > a lot of child tables. Instead I thought it's more straight forward if
> > we compare the summation of the statistics of child tables (e.g.
> > n_live_tuples, n_dead_tuples etc) to vacuum thresholds when we
> > consider the needs of autovacuum on the parent table. What do you
> > think?
>
> There's this old email where Tom outlines a few ideas about triggering
> auto-analyze on inheritance trees:
>
> https://www.postgresql.org/message-id/4823.1262132964%40sss.pgh.pa.us
>
> If I'm reading that correctly, the idea is to track only
> changes_since_analyze and none of the finer-grained stats like
> live/dead tuples for inheritance parents (partitioned tables) using
> some new pgstat infrastrcture, an idea that Hosoya-san also seems to
> be considering per an off-list discussion. Besides the complexity of
> getting that infrastructure in place, an important question is whether
> the current system of applying threshold and scale factor to
> changes_since_analyze should be used as-is for inheritance parents
> (partitioned tables), because if users set those parameters similarly
> to for regular tables, autovacuum might analyze partitioned tables
> more than necessary.

How are you going to track changes_since_analyze of partitioned table?
It's just an idea but we can accumulate changes_since_analyze of
partitioned table by adding child tables's value after analyzing each
child table. And compare the partitioned tables value to the threshold
that is computed by (autovacuum_analyze_threshold + total rows
including all child tables * autovacuum_analyze_scale_factor).

> By the way, maybe I'm misunderstanding what Sawada-san wrote above,
> but the only missing piece seems to be a way to trigger an *analyze*
> on the parent tables -- to collect optimizer statistics for the
> inheritance trees -- not vacuum, for which the existing system seems
> enough.

Right. We need only autoanalyze on partitioned tables.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-01-28 11:36:41 Re: Expose lock group leader pid in pg_stat_activity
Previous Message Heikki Linnakangas 2020-01-28 11:12:05 Re: ReadRecord wrongly assumes randAccess after 38a957316d.