Re: Autovacuum on partitioned table (autoanalyze)

From: yuzuko <yuzukohosoya(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Amit Langote <amitlangote09(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2021-04-01 11:34:53
Message-ID: CAKkQ508EzAG9ib8rRSNozuYhB1e6p_4FzwrG=YRTW2fENT2+BA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

Thank you for reviewing the patch.

> Firstly, the patch propagates the changes_since_analyze values from
> do_analyze_rel, i.e. from the worker after it analyzes the relation.
> That may easily lead to cases with unnecessary analyzes - consider a
> partitioned with 4 child relations:
> [ explanation ]
>
I didn't realize that till now. Indeed, this approach increments parent's
changes_since_analyze counter according to its leaf partition's counter
when the leaf partition is analyzed, so it will cause unnecessary ANALYZE
on partitioned tables as you described.

> I propose a different approach - instead of propagating the counts in
> do_analyze_rel for individual leaf tables, let's do that in bulk in
> relation_needs_vacanalyze. Before the (existing) first pass over
> pg_class, we can add a new one, propagating counts from leaf tables to
> parents.
>
Thank you for your suggestion. I think it could solve all the issues
you mentioned. I modified the patch based on this approach:

- Create a new counter, PgStat_Counter changes_since_analyze_reported,
to track changes_since_analyze we already propagated to ancestors.
This is used for internal processing and users may not need to know it.
So this counter is not displayed at pg_stat_all_tables view for now.

- Create a new function, pgstat_propagate_changes() which propagates
changes_since_analyze counter to all ancestors and saves
changes_since_analyze_reported. This function is called in
do_autovacuum() before relation_needs_vacanalyze().

> Note: I do have some ideas about how to improve that, I've started a
> separate thread about it [1].
>
I'm also interested in merging children's statistics for partitioned tables
because it will make ANALYZE on inheritance trees more efficient.
So I'll check it later.

> I forgot to mention one additional thing yesterday - I wonder if we need
> to do something similar after a partition is attached/detached. That can
> also change the parent's statistics significantly, so maybe we should
> handle all partition's rows as changes_since_analyze? Not necessarily
> something this patch has to handle, but might be related.
>
Regarding attached/detached partitions, I think we should update statistics
of partitioned tables according to the new inheritance tree. The latest patch
hasn't handled this case yet, but I'll give it a try soon.

Attach the v13 patch to this email. Could you please check it again?

--
Best regards,
Yuzuko Hosoya
NTT Open Source Software Center

Attachment Content-Type Size
v13_autovacuum_on_partitioned_table.patch application/octet-stream 21.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-04-01 12:02:09 Re: Failed assertion on standby while shutdown
Previous Message houzj.fnst@fujitsu.com 2021-04-01 10:41:30 RE: Hybrid Hash/Nested Loop joins and caching results from subplans