Re: New strategies for freezing, advancing relfrozenxid early

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2023-01-26 03:10:59
Message-ID: 20230126031059.qju3qfrqiaepp3ws@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hk,

On 2023-01-25 18:31:16 -0800, Peter Geoghegan wrote:
> On Wed, Jan 25, 2023 at 5:49 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Sure. But significantly regressing plausible if not common workloads is
> > different than knowing that there'll be some edge case where we'll do
> > something worse.
>
> That's very vague. Significant to whom, for what purpose?

Sure it's vague. But you can't tell me that it's uncommon to use postgres to
store rows that isn't retained for > 50million xids.

> > I reproduced both with checkpoint_timeout=5min and 1min. 1min is easier for
> > impatient me.
>
> You said "Autovacuum on average generates between 1.5x-7x as much WAL
> as before". Why stop there, though? There's a *big* multiplicative
> effect in play here from FPIs, obviously, so the sky's the limit. Why
> not set checkpoint_timeout to 30s?

The amount of WAL increases substantially even with 5min, the degree of the
increase varies more though. But that largely vanishes if you increase the
time after which rows are deleted a bit. I just am not patient enough to wait
for that.

> > I switched between vacuum_freeze_strategy_threshold=0 and
> > vacuum_freeze_strategy_threshold=too-high, because it's quicker/takes less
> > warmup to set up something with smaller tables.
>
> This makes no sense to me, at all.

It's quicker to run the workload with a table that initially is below 4GB, but
still be able to test the eager strategy. It wouldn't change anything
fundamental to just make the rows a bit wider, or to have a static portion of
the table.

And changing between vacuum_freeze_strategy_threshold=0/very-large (or I
assume -1, didn't check) while the workload is running having to wait until
the 120s to start deleting have passed..

> > The concrete setting of vacuum_freeze_strategy_threshold doesn't matter.
> > Table size simply isn't a usable proxy for whether eager freezing is a good
> > idea or not.
>
> It's not supposed to be - you have it backwards. It's intended to work
> as a proxy for whether lazy freezing is a bad idea, particularly in
> the worst case.

That's a distinction without a difference.

> There is also an effect that likely would have been protective with
> your test case had you used a larger table with the same test case
> (and had you not lowered vacuum_freeze_strategy_threshold from its
> already low default).

Again, you just need a less heavily changing portion of the the table or a
slightly larger "deletion delay" and you end up with a table well over
4GB. Even as stated I end up with > 4GB after a bit of running.

It's just a shortcut to make testing this easier.

> > You can have a 1TB table full of transient data, or you can have a 1TB table
> > where part of the data is transient and only settles after a time. In neither
> > case eager freezing is ok.
>
> It sounds like you're not willing to accept any kind of trade-off.

I am. Just not every tradeoff. I just don't see any useful tradeoffs purely
based on the relation size.

> How, in general, can we detect what kind of 1TB table it will be, in the
> absence of user input?

I suspect we'll need some form of heuristics to differentiate between tables
that are more append heavy and tables that are changing more heavily. I think
it might be preferrable to not have a hard cliff but a gradual changeover -
hard cliffs tend to lead to issue one can't see coming.

I think several of the heuristics below become easier once we introduce "xid
age" vacuums.

One idea is to start tracking the number of all-frozen pages in pg_class. If
there's a significant percentage of all-visible but not all-frozen pages,
vacuum should be more eager. If only a small portion of the table is not
frozen, there's no need to be eager. If only a small portion of the table is
all-visible, there similarly is no need to freeze eagerly.

I IIRC previously was handwaving at keeping track of the average age of tuples
on all-visible pages. That could extend the prior heuristic. A heavily
changing table will have a relatively young average, a more append only table
will have an increasing average age.

It might also make sense to look at the age of relfrozenxid - there's really
no point in being overly eager if the relation is quite young. And a very
heavily changing table will tend to be younger. But likely the approach of
tracking the age of all-visible pages will be more accurate.

The heuristics don't have to be perfect. If we get progressively more eager,
an occasional somewhat eager vacuum isn't a huge issue, as long as it then
leads to the next few vacuums to become less eager.

> And in the absence of user input, why would we prefer to default to a
> behavior that is highly destabilizing when we get it wrong?

Users know the current behaviour. Introducing significant issues that didn't
previously exist will cause new issues and new frustrations.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-01-26 03:36:57 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Peter Geoghegan 2023-01-26 02:43:10 Re: New strategies for freezing, advancing relfrozenxid early