Re: New strategies for freezing, advancing relfrozenxid early

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2023-01-26 00:43:47
Message-ID: 20230126004347.gepcmyenk2csxrri@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-24 14:49:38 -0800, Peter Geoghegan wrote:
> On Mon, Jan 16, 2023 at 5:55 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > 0001 (the freezing strategies patch) is now committable IMV. Or at
> > least will be once I polish the docs a bit more. I plan on committing
> > 0001 some time next week, barring any objections.
>
> I plan on committing 0001 (the freezing strategies commit) tomorrow
> morning, US Pacific time.

I unfortunately haven't been able to keep up with the thread and saw this just
now. But I've expressed the concern below several times before, so it
shouldn't come as a surprise.

I think, as committed, this will cause serious issues for some reasonably
common workloads, due to substantially increased WAL traffic.

The most common problematic scenario I see are tables full of rows with
limited lifetime. E.g. because rows get aggregated up after a while. Before
those rows practically never got frozen - but now we'll freeze them all the
time.

I whipped up a quick test: 15 pgbench threads insert rows, 1 psql \while loop
deletes older rows.

Workload fits in s_b:

Autovacuum on average generates between 1.5x-7x as much WAL as before,
depending on how things interact with checkpoints. And not just that, each
autovac cycle also takes substantially longer than before - the average time
for an autovacuum roughly doubled. Which of course increases the amount of
bloat.

When workload doesn't fit in s_b:

Time for vacuuming goes up to ~5x. WAL volume to ~9x. Autovacuum can't keep up
with bloat, every vacuum takes longer than the prior one:
65s->78s->139s->176s
And that's with autovac cost limits removed! Relation size nearly doubles due
to bloat.

After I disabled the new strategy autovac started to catch up again:
124s->101s->103->46s->20s->28s->24s

This is significantly worse than I predicted. This was my first attempt at
coming up with a problematic workload. There'll likely be way worse in
production.

I think as-is this logic will cause massive issues.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-01-26 01:15:00 Re: New strategies for freezing, advancing relfrozenxid early
Previous Message Nathan Bossart 2023-01-26 00:33:19 Re: suppressing useless wakeups in logical/worker.c