Re: Thinking about ANALYZE stats and autovacuum and large non-uniform tables

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thinking about ANALYZE stats and autovacuum and large non-uniform tables
Date: 2021-10-21 21:45:36
Message-ID: CAH2-Wz=HNSLrbTx3d9EbA=9nNM+7tRxznoChVhNmUYn3VTk8wg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 21, 2021 at 2:13 PM Greg Stark <stark(at)mit(dot)edu> wrote:
> The problem I'm finding is that the distribution of these small
> subsets can swing quickly. And understanding intercolumn correlations
> even if we could do it perfectly would be no help at all.
>
> Consider a table with millions of rows that are "done" but none that
> are "pending". Inserting just a few hundred or thousand new pending
> rows makes any estimates based on the existing statistics entirely
> incorrect. Even if we had perfect statistics capable of making perfect
> estimates they would be entirely wrong once a few inserts of pending
> rows are done.

I am very sympathetic to this view of things. Because this asymmetry
obviously exists, and matters. There is no getting around that.

> Worse, this is kind of true for even n_dead_tup, n_mod_since_analyze,
> etc are kind of affected by this. It's easy (at least on older
> versions, maybe Peter's work has fixed this for btree) to get severe
> index bloat because vacuum doesn't run for a long time relative to the
> size of the busy portion of a table.

My work (especially in 14) has definitely helped a great deal with
index bloat, by cleaning it up in a targeted fashion, based on
page-level considerations. This is just the only thing that can work;
we can never expect VACUUM to be able to deal with that, no matter
what. Simply because it's totally normal and expected for index bloat
to grow at an uneven rate over time.

I do still think that there is an unsolved issue here, which leads to
problems with index bloat when there isn't "B-Tree keyspace
concentration" of garbage index tuples. That problem is with the
statistics that drive VACUUM themselves; they just don't work very
well in certain cases [1]. Statistics that drive autovacuum usually
come from ANALYZE, of course. The entire intellectual justification
for database statistics doesn't really carry over to VACUUM. There are
certain "physical database" implementation details that bleed into the
way ANALYZE counts dead rows. For example, most dead tuples are
usually LP_DEAD stub line pointers (not even tuples). They're only 4
bytes, whereas live tuples are about 30 bytes at a minimum (depending
on how you count it). This leads to the ANALYZE block-based sampling
becoming confused.

This confusion seems related to the fact that ANALYZE is really a
"logical database" thing. It's slightly amazing that statistics from
ANALYZE work as well as they do for query planning, so we shouldn't be
too surprised.

Note that the TPC-C issue I describe in [1] involves a table that's a
little bit like a queue table, but with lots of non-HOT updates (lots
overall, but only one update per logical row, ever). This might tie
things to what Thomas just said about DB2 and queue tables.

> I'm imagining to really tackle this we should be doing something like
> noticing when inserts, updates, deletes are affecting key values that
> are "rare" according to the statistics and triggering autovacuum
> ANALYZE statements that use indexes to only update the statistics for
> the relevant key ranges.

I'm not sure. I tend to think that the most promising approaches all
involve some kind of execution time smarts about the statistics, and
their inherent unreliability. Somehow query execution itself should
become less gullible, at least in cases where we really can have high
confidence in the statistics being wrong at this exact time, for this
exact key space.

[1] https://postgr.es/m/CAH2-Wz=9R83wcwZcPUH4FVPeDM4znzbzMvp3rt21+XhQWMU8+g@mail.gmail.com
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-10-21 21:48:02 Re: [RFC] building postgres with meson
Previous Message Bossart, Nathan 2021-10-21 21:45:11 Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?