Re: Autovacuum on partitioned table (autoanalyze)

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, yuzuko <yuzukohosoya(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, David Steele <david(at)pgmasters(dot)net>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Amit Langote <amitlangote09(at)gmail(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Date: 2021-08-10 23:00:09
Message-ID: 202108102300.wf5nirpzia52@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-Aug-09, Alvaro Herrera wrote:

> > 3) What is the goal of the autovac_refresh_stats() after the loop doing
> > pgstat_report_anl_ancestors()? I think it'll be common that the stats
> > collector hasn't even processed the incoming messages by that point, not to
> > speak of actually having written out a new stats file. If it took less than
> > 10ms (PGSTAT_RETRY_DELAY) to get to autovac_refresh_stats(),
> > backend_read_statsfile() will not wait for a new stats file to be written
> > out, and we'll just re-read the state we previously did.
> >
> > It's pretty expensive to re-read the stats file in some workloads, so I'm a
> > bit concerned that we end up significantly increasing the amount of stats
> > updates/reads, without actually gaining anything reliable?
>
> This is done once per autovacuum run and the point is precisely to let
> the next block absorb the updates that were sent. In manual ANALYZE we
> do it to inform future autovacuum runs.
>
> Note that the PGSTAT_RETRY_DELAY limit is used by the autovac launcher
> only, and this code is running in the worker; we do flush out the old
> data. Yes, it's expensive, but we're not doing it once per table, just
> once per worker run.

I misunderstood what you were talking about here -- I thought it was
about the delay in autovac_refresh_stats (STATS_READ_DELAY, 1s). Now
that I look at this again I realize what your point is, and you're
right, there isn't sufficient time for the collector to absorb the
messages we sent before the next scan pg_class scan starts.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-08-10 23:04:56 Re: Use extended statistics to estimate (Var op Var) clauses
Previous Message David G. Johnston 2021-08-10 21:53:30 Re: DROP relation IF EXISTS Docs and Tests - Bug Fix