Re: Autovacuum on partitioned table (autoanalyze)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: yuzuko <yuzukohosoya(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Amit Langote <amitlangote09(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2021-07-28 02:23:42
Message-ID: 20210728022342.brary7sntq4fvsbl@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-07-22 13:54:58 -0700, Andres Freund wrote:
> On 2021-04-08 01:20:14 -0400, Alvaro Herrera wrote:
> > On 2021-Apr-07, Alvaro Herrera wrote:
> >
> > > OK, I bit the bullet and re-did the logic in the way I had proposed
> > > earlier in the thread: do the propagation on the collector's side, by
> > > sending only the list of ancestors: the collector can read the tuple
> > > change count by itself, to add it to each ancestor. This seems less
> > > wasteful. Attached is v16 which does it that way and seems to work
> > > nicely under my testing.
> >
> > Pushed with this approach. Thanks for persisting with this.
>
> I'm looking at this in the context of rebasing & polishing the shared
> memory stats patch.
>
> I have a few questions / concerns:

Another one, and I think this might warrant thinking about for v14:

Isn't this going to create a *lot* of redundant sampling? Especially if you
have any sort of nested partition tree. In the most absurd case a partition
with n parents will get sampled n times, solely due to changes to itself.

Look at the following example:

BEGIN;
DROP TABLE if exists p;
CREATE TABLE p (i int) partition by range(i);
CREATE TABLE p_0 PARTITION OF p FOR VALUES FROM ( 0) to (5000) partition by range(i);
CREATE TABLE p_0_0 PARTITION OF p_0 FOR VALUES FROM ( 0) to (1000);
CREATE TABLE p_0_1 PARTITION OF p_0 FOR VALUES FROM (1000) to (2000);
CREATE TABLE p_0_2 PARTITION OF p_0 FOR VALUES FROM (2000) to (3000);
CREATE TABLE p_0_3 PARTITION OF p_0 FOR VALUES FROM (3000) to (4000);
CREATE TABLE p_0_4 PARTITION OF p_0 FOR VALUES FROM (4000) to (5000);
-- create some initial data
INSERT INTO p select generate_series(0, 5000 - 1) data FROM generate_series(1, 100) reps;
COMMIT;

UPDATE p_0_4 SET i = i;

Whenever the update is executed, all partitions will be sampled at least twice
(once for p and once for p_0), with p_0_4 sampled three times.

Of course, this is an extreme example, but it's not hard to imagine cases
where v14 will cause the number of auto-analyzes increase sufficiently to bog
down autovacuum to a problematic degree.

Additionally, while analyzing all child partitions for a partitioned tables
are AccessShareLock'ed at once. If a partition hierarchy has more than one
level, it actually is likely that multiple autovacuum workers will end up
processing the ancestors separately. This seems like it might contribute to
lock exhaustion issues with larger partition hierarchies?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2021-07-28 02:25:43 Re: Slim down integer formatting
Previous Message Bossart, Nathan 2021-07-28 01:59:57 Re: archive status ".ready" files may be created too early