Re: New IndexAM API controlling index vacuum strategies

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-12 01:05:15
Message-ID: CAH2-WzmALNenZzjimHNTwOnc2LgfNZOWzZW8oo=r=T7Pj7mHmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 11, 2021 at 8:31 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I agree, but all you need is one long-lived tuple toward the end of
> the array and you're stuck never being able to truncate it. It seems
> like a worthwhile improvement, but whether it actually helps will be
> workload-dependant.

When it comes to improving VACUUM I think that most of the really
interesting scenarios are workload dependent in one way or another. In
fact even that concept becomes a little meaningless much of the time.
For example with workloads that really benefit from bottom-up
deletion, the vast majority of individual leaf pages have quite a bit
of spare capacity at any given time. Again, "rare" events can have
outsized importance in the aggregate -- most of the time every leaf
page taken individually is a-okay!

It's certainly not just indexing stuff. We have a tendency to imagine
that HOT updates occur when indexes are not logically modified, except
perhaps in the presence of some kind of stressor, like a long-running
transaction. I guess that I do the same, informally. But let's not
forget that the reality is that very few tables *consistently* get HOT
updates, regardless of the shape of indexes and UPDATE statements. So
in the long run practically all tables in many ways consist of pages
that resemble those from a table that "only gets non-HOT updates" in
the simplest sense.

I suspect that the general preference for using lower-offset LP_UNUSED
items first (inside PageAddItemExtended()) will tend to make this
problem of "one high tuple that isn't dead" not so bad in many cases.
In any case Matthias' patch makes the situation strictly better, and
we can only fix one problem at a time. We have to start by eliminating
individual low-level behaviors that *don't make sense*.

Jan Wieck told me that he had to set heap fill factor to the
ludicrously conservative setting of 50 just to get the
TPC-C/BenchmarkSQL OORDER and ORDER_LINE tables to be stable over time
[1] -- on-disk size stability is absolutely expected here. And these
are the biggest tables! It takes hours if not days or even weeks for
the situation to really get out of hand with a normal FF setting. I am
almost certain that this is due to second order effects (even third
order effects) that start from things like line pointer bloat and FSM
inefficiencies. I suspect that it doesn't matter too much if you make
heap fill factor 70 or 90 with these tables because the effect is
non-linear -- for whatever reason 50 was found to be the magic number,
through trial and error.

"Incremental VACUUM" (the broad concept, not just this one patch) is
likely to rely on our being able to make the performance
characteristics more linear, at least in future iterations. Of course
it's true that we should eliminate line pointer bloat and any kind of
irreversible bloat because the overall effect is non-linear, unstable
behavior, which is highly undesirable on its face. But it's also true
that these improvements leave us with more linear behavior at a
high-level, which is itself much easier to understand and model in a
top-down fashion. It then becomes possible to build a cost model that
makes VACUUM sensitive to the needs of the app, and how to make
on-disk sizes *stable* in a variety of conditions. So in that sense
I'd say that Matthias' patch is totally relevant.

I know that I sound hippy-dippy here. But the fact is that bottom-up
index deletion has *already* made the performance characteristics much
simpler and therefore much easier to model. I hope to do more of that.

[1] https://github.com/wieck/benchmarksql/blob/29b62435dc5c9eaf178983b43818fcbba82d4286/run/sql.postgres/extraCommandsBeforeLoad.sql#L1
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-03-12 01:07:10 Re: shared-memory based stats collector
Previous Message Fujii Masao 2021-03-12 01:03:31 Re: shared-memory based stats collector