Re: Autovacuum on partitioned table (autoanalyze)

From: yuzuko <yuzukohosoya(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2020-09-17 08:12:36
Message-ID: CAKkQ50_D=1LbYLD9qtWaKxPvRq3T_qGWfQov5eH5rycbMEq-pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Horiguchi-san,

Thank you for reviewing.

On Tue, Sep 15, 2020 at 7:01 PM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Tue, 25 Aug 2020 14:28:20 +0200, Daniel Gustafsson <daniel(at)yesql(dot)se> wrote in
> > > I attach the latest patch that solves the above Werror.
> > > Could you please check it again?
> >
> > This version now pass the tests in the Travis pipeline as can be seen in the
> > link below, and is ready to be reviewed in the upcoming commitfest:
> >
> > http://cfbot.cputube.org/yuzuko-hosoya.html
>
> At Mon, 6 Jul 2020 19:35:37 +0900, yuzuko <yuzukohosoya(at)gmail(dot)com> wrote in
> > I think there are other approaches like Tom's idea that Justin previously
> > referenced, but this patch works the same way as previous patches.
> > (tracks updated/inserted/deleted tuples and checks whether the partitioned
> > tables needs auto-analyze, same as nonpartitioned tables)
> > Because I wanted to be able to analyze partitioned tables by autovacuum
> > as a first step, and I think this approach is the simplest way to do it.
>
> I'm not sure if anything bad happen if parent and children are not
> agree on statistics.
>
> The requirement suggested here seems to be:
>
> - We want to update parent's stats when any of its children gets its
> stats updated. This is curucial especially for time-series
> partitioning.
>
> - However, we don't want analyze the whole-tree every time any of the
> children was analyzed.
>
> To achieve the both, stats-merging seems to the optimal solution.
>
> Putting that aside, I had a brief look on the latest patch.
>
> /* We only count stats for things that have storage */
> - if (!RELKIND_HAS_STORAGE(relkind))
> + if (!RELKIND_HAS_STORAGE(relkind) ||
> + relkind == RELKIND_PARTITIONED_TABLE)
> {
> rel->pgstat_info = NULL;
>
> RELKIND_HAS_STORAGE(RELKIND_PARTITIONED_TABLE) is already false.
> Maybe you wanted to do "&& relkind !=" instead:p
>
Oh, indeed. I'll fix it.

>
> + /*
> + * If this relation is partitioned, we store all ancestors' oid
> + * to propagate its changed_tuples to their parents when this
> + * transaction is committed.
> + */
> + if (rel->rd_rel->relispartition && pgstat_info->ancestors == NULL)
>
> If the relation was detached then attached to another partition within
> a transaction, the ancestor list would get stale and the succeeding
> modification to the relation propagates into wrong ancestors.
>
> I think vacuum time is more appropriate to modify ancestors stats. It
> seems to me that what Alvalo pointed isthe list-order-susceptible
> manner of collecting children's modified tuples.
>
I proposed a patch that modified ancestors stats when vacuuming previously.
In that time, having been pointed out by Alvaro and Amit, I tried to update the
parents' changes_since_analyze in every ANALYZE. However, in that case,
the problem mentioned in [1] occurred, but I could not find a way to avoid it.
I think that it can be solved by updating the parents' changes_since_analyze
only in the case of auto analyze, but what do you think?

>
> + ? 0 /* partitioned tables don't have any data, so it's 0 */
>
> If the comment is true, we shouldn't have non-zero t_changed_tuples,
> too. I think the reason for the lines is something different.
>
Yes, surely. I think updating the values of live_tuples and dead_tuples
is confusing for users. I'll consider another comment.

[1] https://www.postgresql.org/message-id/CAKkQ50-bwFEDMBGb1JmDXffXsiU8xk-hN6kJK9CKjdBa7r%3DHdw%40mail.gmail.com
--
Best regards,
Yuzuko Hosoya

NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-09-17 08:15:30 Re: pgindent vs dtrace on macos
Previous Message Michael Paquier 2020-09-17 07:51:01 Re: SQL:2011 PERIODS vs Postgres Ranges?