document the need to analyze partitioned tables

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: yuzuko <yuzukohosoya(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: document the need to analyze partitioned tables
Date: 2021-09-13 03:54:09
Message-ID: 20210913035409.GA10647@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Adding -hackers, sorry for the duplicate.

This seems to be deficient, citing
https://www.postgresql.org/message-id/flat/0d1b394b-bec9-8a71-a336-44df7078b295%40gmail.com

I'm proposing something like the attached. Ideally, there would be a central
place to put details, and the other places could refer to that.

Since the autoanalyze patch was reverted, this should be easily applied to
backbranches, which is probably most of its value.

commit 4ad2c8f6fd8eb26d76b226e68d3fdb8f0658f113
Author: Justin Pryzby <pryzbyj(at)telsasoft(dot)com>
Date: Thu Jul 22 16:06:18 2021 -0500

documentation deficiencies for ANALYZE of partitioned tables

This is partially extracted from 1b5617eb844cd2470a334c1d2eec66cf9b39c41a,
which was reverted.

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 36f975b1e5..decfabff5d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -290,6 +290,14 @@
to meaningful statistical changes.
</para>

+ <para>
+ Tuples changed in partitions and inheritence children do not count towards
+ analyze on the parent table. If the parent table is empty or rarely
+ changed, it may never be processed by autovacuum. It is necessary to
+ periodically run an manual <command>ANALYZE</command> to keep the statistics
+ of the table hierarchy up to date.
+ </para>
+
<para>
As with vacuuming for space recovery, frequent updates of statistics
are more useful for heavily-updated tables than for seldom-updated
@@ -347,6 +355,18 @@
<command>ANALYZE</command> commands on those tables on a suitable schedule.
</para>
</tip>
+
+ <tip>
+ <para>
+ The autovacuum daemon does not issue <command>ANALYZE</command> commands for
+ partitioned tables. Inheritence parents will only be analyzed if the
+ parent is changed - changes to child tables do not trigger autoanalyze on
+ the parent table. It is necessary to periodically run an manual
+ <command>ANALYZE</command> to keep the statistics of the table hierarchy up to
+ date.
+ </para>
+ </tip>
+
</sect2>

<sect2 id="vacuum-for-visibility-map">
@@ -817,6 +837,18 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
</programlisting>
is compared to the total number of tuples inserted, updated, or deleted
since the last <command>ANALYZE</command>.
+
+ Partitioned tables are not processed by autovacuum, and their statistics
+ should be updated by manually running <command>ANALYZE</command> when the
+ table is first populated, and whenever the distribution of data in its
+ partitions changes significantly.
+ </para>
+
+ <para>
+ Partitioned tables are not processed by autovacuum. Statistics
+ should be collected by running a manual <command>ANALYZE</command> when it is
+ first populated, and updated whenever the distribution of data in its
+ partitions changes significantly.
</para>

<para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 89ff58338e..b84853fd6f 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1765,9 +1765,11 @@ SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;
<title>Run <command>ANALYZE</command> Afterwards</title>

<para>
+
Whenever you have significantly altered the distribution of data
within a table, running <link linkend="sql-analyze"><command>ANALYZE</command></link> is strongly recommended. This
includes bulk loading large amounts of data into the table. Running
+
<command>ANALYZE</command> (or <command>VACUUM ANALYZE</command>)
ensures that the planner has up-to-date statistics about the
table. With no statistics or obsolete statistics, the planner might
diff --git a/doc/src/sgml/ref/analyze.sgml b/doc/src/sgml/ref/analyze.sgml
index c423aeeea5..20ffbc2d7a 100644
--- a/doc/src/sgml/ref/analyze.sgml
+++ b/doc/src/sgml/ref/analyze.sgml
@@ -250,22 +250,33 @@ ANALYZE [ VERBOSE ] [ <replaceable class="parameter">table_and_columns</replacea
</para>

<para>
- If the table being analyzed has one or more children,
- <command>ANALYZE</command> will gather statistics twice: once on the
- rows of the parent table only, and a second time on the rows of the
- parent table with all of its children. This second set of statistics
- is needed when planning queries that traverse the entire inheritance
- tree. The autovacuum daemon, however, will only consider inserts or
- updates on the parent table itself when deciding whether to trigger an
- automatic analyze for that table. If that table is rarely inserted into
- or updated, the inheritance statistics will not be up to date unless you
- run <command>ANALYZE</command> manually.
+ If the table being analyzed is partitioned, <command>ANALYZE</command>
+ will gather statistics by sampling blocks randomly from its partitions;
+ in addition, it will recurse into each partition and update its statistics.
+ (However, in multi-level partitioning scenarios, each leaf partition
+ will only be analyzed once.)
+ By constrast, if the table being analyzed has inheritance children,
+ <command>ANALYZE</command> will gather statistics for it twice:
+ once on the rows of the parent table only, and a second time on the
+ rows of the parent table with all of its children. This second set of
+ statistics is needed when planning queries that traverse the entire
+ inheritance tree. The child tables themselves are not individually
+ analyzed in this case.
</para>

<para>
- If any of the child tables are foreign tables whose foreign data wrappers
- do not support <command>ANALYZE</command>, those child tables are ignored while
- gathering inheritance statistics.
+ The autovacuum daemon does not process partitioned tables or inheritence
+ parents. It is usually necessary to periodically run a manual
+ <command>ANALYZE</command> to keep the statistics of the table hierarchy
+ up to date (except for nonempty inheritence parents which undergo
+ modifications of their own table data).
+ See...
+ </para>
+
+ <para>
+ If any of the child tables or partitions are foreign tables whose foreign
+ data wrappers do not support <command>ANALYZE</command>, those tables are
+ ignored while gathering inheritance statistics.
</para>

<para>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-09-13 04:04:32 Re: TAP test for recovery_end_command
Previous Message Bharath Rupireddy 2021-09-13 03:21:18 Re: Remove duplicate static function check_permissions in slotfuncs.c and logicalfuncs.c