Re: document the need to analyze partitioned tables

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, yuzuko <yuzukohosoya(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: document the need to analyze partitioned tables
Date: 2023-01-12 23:27:47
Message-ID: 20230112232747.GA2111950@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 05, 2022 at 10:37:01AM +0200, Laurenz Albe wrote:
> On Mon, 2022-03-28 at 15:05 +0200, Tomas Vondra wrote:
>> I've pushed the last version, and backpatched it to 10 (not sure I'd
>> call it a bugfix, but I certainly agree with Justin it's worth
>> mentioning in the docs, even on older branches).
>
> I'd like to suggest an improvement to this. The current wording could
> be read to mean that dead tuples won't get cleaned up in partitioned tables.

Well, dead tuples won't get cleaned up in partitioned tables, as
partitioned tables do not have storage. But I see what you mean. Readers
might misinterpret this to mean that autovacuum will not process the
partitions. There's a good definition of what the docs mean by
"partitioned table" [0], but FWIW it took me some time before I
consistently read "partitioned table" to mean "only the thing with relkind
set to 'p'" and not "both the partitioned table and its partitions." So,
while the current wording it technically correct, I think it'd be
reasonable to expand it to help avoid confusion.

Here is my take on the wording:

Since all the data for a partitioned table is stored in its partitions,
autovacuum does not process partitioned tables. Instead, autovacuum
processes the individual partitions that are regular tables. This
means that autovacuum only gathers statistics for the regular tables
that serve as partitions and not for the partitioned tables. Since
queries may rely on a partitioned table's statistics, you should
collect statistics via the ANALYZE command when it is first populated,
and again whenever the distribution of data in its partitions changes
significantly.

[0] https://www.postgresql.org/docs/devel/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-01-13 00:33:14 Re: Beautify pg_walinspect docs a bit
Previous Message Jacob Champion 2023-01-12 23:19:19 Re: [EXTERNAL] Re: Support load balancing in libpq