Re: document the need to analyze partitioned tables

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, yuzuko <yuzukohosoya(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: document the need to analyze partitioned tables
Date: 2021-09-13 04:38:50
Message-ID: CALNJ-vTV+BPhyCD9Tz0RVjEcOLBORDaG5wiY7=Tn5mwn+pwmnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 12, 2021 at 8:54 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:

> Adding -hackers, sorry for the duplicate.
>
> This seems to be deficient, citing
>
> https://www.postgresql.org/message-id/flat/0d1b394b-bec9-8a71-a336-44df7078b295%40gmail.com
>
> I'm proposing something like the attached. Ideally, there would be a
> central
> place to put details, and the other places could refer to that.
>
> Since the autoanalyze patch was reverted, this should be easily applied to
> backbranches, which is probably most of its value.
>
> commit 4ad2c8f6fd8eb26d76b226e68d3fdb8f0658f113
> Author: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
> Date: Thu Jul 22 16:06:18 2021 -0500
>
> documentation deficiencies for ANALYZE of partitioned tables
>
> This is partially extracted from
> 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
> which was reverted.
>
> diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
> index 36f975b1e5..decfabff5d 100644
> --- a/doc/src/sgml/maintenance.sgml
> +++ b/doc/src/sgml/maintenance.sgml
> @@ -290,6 +290,14 @@
> to meaningful statistical changes.
> </para>
>
> + <para>
> + Tuples changed in partitions and inheritence children do not count
> towards
> + analyze on the parent table. If the parent table is empty or rarely
> + changed, it may never be processed by autovacuum. It is necessary to
> + periodically run an manual <command>ANALYZE</command> to keep the
> statistics
> + of the table hierarchy up to date.
> + </para>
> +
> <para>
> As with vacuuming for space recovery, frequent updates of statistics
> are more useful for heavily-updated tables than for seldom-updated
> @@ -347,6 +355,18 @@
> <command>ANALYZE</command> commands on those tables on a suitable
> schedule.
> </para>
> </tip>
> +
> + <tip>
> + <para>
> + The autovacuum daemon does not issue <command>ANALYZE</command>
> commands for
> + partitioned tables. Inheritence parents will only be analyzed if the
> + parent is changed - changes to child tables do not trigger
> autoanalyze on
> + the parent table. It is necessary to periodically run an manual
> + <command>ANALYZE</command> to keep the statistics of the table
> hierarchy up to
> + date.
> + </para>
> + </tip>
> +
> </sect2>
>
> <sect2 id="vacuum-for-visibility-map">
> @@ -817,6 +837,18 @@ analyze threshold = analyze base threshold + analyze
> scale factor * number of tu
> </programlisting>
> is compared to the total number of tuples inserted, updated, or
> deleted
> since the last <command>ANALYZE</command>.
> +
> + Partitioned tables are not processed by autovacuum, and their
> statistics
> + should be updated by manually running <command>ANALYZE</command> when
> the
> + table is first populated, and whenever the distribution of data in its
> + partitions changes significantly.
> + </para>
> +
> + <para>
> + Partitioned tables are not processed by autovacuum. Statistics
> + should be collected by running a manual <command>ANALYZE</command>
> when it is
> + first populated, and updated whenever the distribution of data in its
> + partitions changes significantly.
> </para>
>
> <para>
> diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
> index 89ff58338e..b84853fd6f 100644
> --- a/doc/src/sgml/perform.sgml
> +++ b/doc/src/sgml/perform.sgml
> @@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND
> somethingelse;
> <title>Run <command>ANALYZE</command> Afterwards</title>
>
> <para>
> +
> Whenever you have significantly altered the distribution of data
> within a table, running <link
> linkend="sql-analyze"><command>ANALYZE</command></link> is strongly
> recommended. This
> includes bulk loading large amounts of data into the table. Running
> +
> <command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
> ensures that the planner has up-to-date statistics about the
> table. With no statistics or obsolete statistics, the planner might
> diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
> index c423aeeea5..20ffbc2d7a 100644
> --- a/doc/src/sgml/ref/analyze.sgml
> +++ b/doc/src/sgml/ref/analyze.sgml
> @@ -250,22 +250,33 @@ ANALYZE [ VERBOSE ] [ <replaceable
> class="parameter">table_and_columns</replacea
> </para>
>
> <para>
> - If the table being analyzed has one or more children,
> - <command>ANALYZE</command> will gather statistics twice: once on the
> - rows of the parent table only, and a second time on the rows of the
> - parent table with all of its children. This second set of statistics
> - is needed when planning queries that traverse the entire inheritance
> - tree. The autovacuum daemon, however, will only consider inserts or
> - updates on the parent table itself when deciding whether to trigger an
> - automatic analyze for that table. If that table is rarely inserted
> into
> - or updated, the inheritance statistics will not be up to date unless
> you
> - run <command>ANALYZE</command> manually.
> + If the table being analyzed is partitioned, <command>ANALYZE</command>
> + will gather statistics by sampling blocks randomly from its
> partitions;
> + in addition, it will recurse into each partition and update its
> statistics.
> + (However, in multi-level partitioning scenarios, each leaf partition
> + will only be analyzed once.)
> + By constrast, if the table being analyzed has inheritance children,
> + <command>ANALYZE</command> will gather statistics for it twice:
> + once on the rows of the parent table only, and a second time on the
> + rows of the parent table with all of its children. This second set of
> + statistics is needed when planning queries that traverse the entire
> + inheritance tree. The child tables themselves are not individually
> + analyzed in this case.
> </para>
>
> <para>
> - If any of the child tables are foreign tables whose foreign data
> wrappers
> - do not support <command>ANALYZE</command>, those child tables are
> ignored while
> - gathering inheritance statistics.
> + The autovacuum daemon does not process partitioned tables or
> inheritence
> + parents. It is usually necessary to periodically run a manual
> + <command>ANALYZE</command> to keep the statistics of the table
> hierarchy
> + up to date (except for nonempty inheritence parents which undergo
> + modifications of their own table data).
> + See...
> + </para>
> +
> + <para>
> + If any of the child tables or partitions are foreign tables whose
> foreign
> + data wrappers do not support <command>ANALYZE</command>, those tables
> are
> + ignored while gathering inheritance statistics.
> </para>
>
> <para>
>
>
> Hi,
Minor comment:

periodically run an manual -> periodically run a manual

Cheers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-09-13 05:34:58 Re: PG Docs - CREATE SUBSCRIPTION option list order
Previous Message Paul A Jungwirth 2021-09-13 04:12:19 Re: SQL:2011 application time