Re: Autovacuum on partitioned table (autoanalyze)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: yuzukohosoya(at)gmail(dot)com
Cc: daniel(at)yesql(dot)se, pryzby(at)telsasoft(dot)com, amitlangote09(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, masahiko(dot)sawada(at)2ndquadrant(dot)com, laurenz(dot)albe(at)cybertec(dot)at, pgsql-hackers(at)lists(dot)postgresql(dot)org, stark(at)mit(dot)edu
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2020-10-23 11:23:27
Message-ID: 20201023.202327.1472218651388751446.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks you for the new version.

At Fri, 23 Oct 2020 15:12:51 +0900, yuzuko <yuzukohosoya(at)gmail(dot)com> wrote in
> Hello,
>
> I reconsidered a way based on the v5 patch in line with
> Horiguchi-san's comment.
>
> This approach is as follows:
> - A partitioned table is checked whether it needs analyze like a plain
> table in relation_needs_vacanalyze(). To do this, we should store
> partitioned table's stats (changes_since_analyze).
> - Partitioned table's changes_since_analyze is updated when
> analyze a leaf partition by propagating its changes_since_analyze.
> In the next scheduled analyze time, it is used in the above process.
> That is, the partitioned table is analyzed behind leaf partitions.
> - The propagation process differs between autoanalyze or plain analyze.
> In autoanalyze, a leaf partition's changes_since_analyze is propagated
> to *all* ancestors. Whereas, in plain analyze on an inheritance tree,
> propagates to ancestors not included the tree to avoid needless counting.
>
> Attach the latest patch to this email.
> Could you check it again please?

+ /*
+ * Get its all ancestors to propagate changes_since_analyze count.
+ * However, when ANALYZE inheritance tree, we get ancestors of
+ * toprel_oid to avoid needless counting.
+ */
+ if (!OidIsValid(toprel_oid))
+ ancestors = get_partition_ancestors(RelationGetRelid(rel));
+ else
+ ancestors = get_partition_ancestors(toprel_oid);

This comment doesn't explaining what the code intends but what the
code does.

The reason for the difference is that if we have a valid toprel_oid,
we analyze all descendants of the relation this time, and if we
propagate the number to the descendants of the top relation, the next
analyze on the relations could happen shortly than expected.

- msg.m_live_tuples = livetuples;
- msg.m_dead_tuples = deadtuples;
+ msg.m_live_tuples = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ ? 0 /* if this is a partitioned table, skip modifying */
+ : livetuples;
+ msg.m_dead_tuples = (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ ? 0 /* if this is a partitioned table, skip modifying */
+ : deadtuples;

Two successive branching with the same condition looks odd. And we
need an explanation of *why* we don't set the values for partitioned
tables.

+ foreach(lc, ancestors)
+ {
+ Oid parentOid = lfirst_oid(lc);
+ Relation parentrel;
+
+ parentrel = table_open(parentOid, AccessShareLock);

I'm not sure, but all of the ancestors always cannot be a parent (in
other words, a parent of a parent of mine is not a parent of
mine). Isn't just rel sufficient?

- * Report ANALYZE to the stats collector, too. However, if doing
- * inherited stats we shouldn't report, because the stats collector only
- * tracks per-table stats. Reset the changes_since_analyze counter only
- * if we analyzed all columns; otherwise, there is still work for
- * auto-analyze to do.
+ * Report ANALYZE to the stats collector, too. Reset the
+ * changes_since_analyze counter only if we analyzed all columns;
+ * otherwise, there is still work for auto-analyze to do.
*/
- if (!inh)
+ if (!inh || onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
pgstat_report_analyze(onerel, totalrows, totaldeadrows,

This still rejects traditional inheritance nonleaf relations. But if
we remove the description about that completely in the comment above,
we should support traditional inheritance parents here. I think we
can do that as far as we need to propagate only per-tuple stats (that
mans not per-attribute) like changes_since_analyze.

Whichever way we take, do we need the description about the behavior
in the documentation?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-10-23 11:32:49 Re: Enumize logical replication message actions
Previous Message Sridhar N Bamandlapally 2020-10-23 11:09:15 git clone failed in windows