Re: Autovacuum on partitioned table (autoanalyze)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: daniel(at)yesql(dot)se
Cc: yuzukohosoya(at)gmail(dot)com, pryzby(at)telsasoft(dot)com, amitlangote09(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, masahiko(dot)sawada(at)2ndquadrant(dot)com, laurenz(dot)albe(at)cybertec(dot)at, pgsql-hackers(at)lists(dot)postgresql(dot)org, stark(at)mit(dot)edu
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2020-09-15 10:01:35
Message-ID: 20200915.190135.1065752215377064117.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 25 Aug 2020 14:28:20 +0200, Daniel Gustafsson <daniel(at)yesql(dot)se> wrote in
> > I attach the latest patch that solves the above Werror.
> > Could you please check it again?
>
> This version now pass the tests in the Travis pipeline as can be seen in the
> link below, and is ready to be reviewed in the upcoming commitfest:
>
> http://cfbot.cputube.org/yuzuko-hosoya.html

At Mon, 6 Jul 2020 19:35:37 +0900, yuzuko <yuzukohosoya(at)gmail(dot)com> wrote in
> I think there are other approaches like Tom's idea that Justin previously
> referenced, but this patch works the same way as previous patches.
> (tracks updated/inserted/deleted tuples and checks whether the partitioned
> tables needs auto-analyze, same as nonpartitioned tables)
> Because I wanted to be able to analyze partitioned tables by autovacuum
> as a first step, and I think this approach is the simplest way to do it.

I'm not sure if anything bad happen if parent and children are not
agree on statistics.

The requirement suggested here seems to be:

- We want to update parent's stats when any of its children gets its
stats updated. This is curucial especially for time-series
partitioning.

- However, we don't want analyze the whole-tree every time any of the
children was analyzed.

To achieve the both, stats-merging seems to the optimal solution.

Putting that aside, I had a brief look on the latest patch.

/* We only count stats for things that have storage */
- if (!RELKIND_HAS_STORAGE(relkind))
+ if (!RELKIND_HAS_STORAGE(relkind) ||
+ relkind == RELKIND_PARTITIONED_TABLE)
{
rel->pgstat_info = NULL;

RELKIND_HAS_STORAGE(RELKIND_PARTITIONED_TABLE) is already false.
Maybe you wanted to do "&& relkind !=" instead:p

+ /*
+ * If this relation is partitioned, we store all ancestors' oid
+ * to propagate its changed_tuples to their parents when this
+ * transaction is committed.
+ */
+ if (rel->rd_rel->relispartition && pgstat_info->ancestors == NULL)

If the relation was detached then attached to another partition within
a transaction, the ancestor list would get stale and the succeeding
modification to the relation propagates into wrong ancestors.

I think vacuum time is more appropriate to modify ancestors stats. It
seems to me that what Alvalo pointed isthe list-order-susceptible
manner of collecting children's modified tuples.

+ ? 0 /* partitioned tables don't have any data, so it's 0 */

If the comment is true, we shouldn't have non-zero t_changed_tuples,
too. I think the reason for the lines is something different.

# Oops! Time's up now.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2020-09-15 10:07:29 Re: ECPG: proposal for new DECLARE STATEMENT
Previous Message lchch1990@sina.cn 2020-09-15 09:30:22 Re: Unnecessary delay in streaming replication due to replay lag