Re: New strategies for freezing, advancing relfrozenxid early

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2023-01-26 03:48:05
Message-ID: CAH2-Wz=f2aOO5TWuwdEjVaN-deJAzo0xe4t2X16=Agf-NK+2Tg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 25, 2023 at 7:11 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > I switched between vacuum_freeze_strategy_threshold=0 and
> > > vacuum_freeze_strategy_threshold=too-high, because it's quicker/takes less
> > > warmup to set up something with smaller tables.
> >
> > This makes no sense to me, at all.
>
> It's quicker to run the workload with a table that initially is below 4GB, but
> still be able to test the eager strategy. It wouldn't change anything
> fundamental to just make the rows a bit wider, or to have a static portion of
> the table.

What does that actually mean? Wouldn't change anything fundamental?

What it would do is significantly reduce the write amplification
effect that you encountered. You came up with numbers of up to 7x, a
number that you used without any mention of checkpoint_timeout being
lowered to only 1 minutes (I had to push to get that information). Had
you done things differently (larger table, larger setting) then that
would have made the regression far smaller. So yeah, "nothing
fundamental".

> > How, in general, can we detect what kind of 1TB table it will be, in the
> > absence of user input?
>
> I suspect we'll need some form of heuristics to differentiate between tables
> that are more append heavy and tables that are changing more heavily.

The TPC-C tables are actually a perfect adversarial cases for this,
because it's both, together. What then?

> I think
> it might be preferrable to not have a hard cliff but a gradual changeover -
> hard cliffs tend to lead to issue one can't see coming.

As soon as you change your behavior you have to account for the fact
that you behaved lazily up until all prior VACUUMs. I think that
you're better off just being eager with new pages and modified pages,
while not specifically going

> I IIRC previously was handwaving at keeping track of the average age of tuples
> on all-visible pages. That could extend the prior heuristic. A heavily
> changing table will have a relatively young average, a more append only table
> will have an increasing average age.
>
>
> It might also make sense to look at the age of relfrozenxid - there's really
> no point in being overly eager if the relation is quite young.

I don't think that's true. What about bulk loading? It's a totally
valid and common requirement.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-26 03:56:08 Re: New strategies for freezing, advancing relfrozenxid early
Previous Message Jeff Davis 2023-01-26 03:45:09 Re: Non-superuser subscription owners