Re: New strategies for freezing, advancing relfrozenxid early

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2023-01-26 20:45:14
Message-ID: 20230126204514.o6fjrnoitpbwvvxf@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-01-26 10:44:45 -0800, Peter Geoghegan wrote:
> On Thu, Jan 26, 2023 at 9:53 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > That's going to be very significantly more aggressive. For example
> > > it'll impact small tables very differently.
> >
> > Maybe it would be too aggressive, not sure. The cost of a freeze WAL record is
> > relatively small, with one important exception below, if we are 99.99% sure
> > that it's not going to require an FPI and isn't going to dirty the page.
> >
> > The exception is that a newer LSN on the page can cause the ringbuffer
> > replacement to trigger more more aggressive WAL flushing. No meaningful
> > difference if we modified the page during pruning, or if the page was already
> > in s_b (since it likely won't be written out via the ringbuffer in that case),
> > but if checksums are off and we just hint-dirtied the page, it could be a
> > significant issue.
>
> Most of the overhead of FREEZE WAL records (with freeze plan
> deduplication and page-level freezing in) is generic WAL record header
> overhead. Your recent adversarial test case is going to choke on that,
> too. At least if you set checkpoint_timeout to 1 minute again.

I don't quite follow. What do you mean with "record header overhead"? Unless
that includes FPIs, I don't think that's that commonly true?

The problematic case I am talking about is when we do *not* emit a WAL record
during pruning (because there's nothing to prune), but want to freeze the
table. If you don't log an FPI, the remaining big overhead is that increasing
the LSN on the page will often cause an XLogFlush() when writing out the
buffer.

I don't see what your reference to checkpoint timeout is about here?

Also, as I mentioned before, the problem isn't specific to checkpoint_timeout
= 1min. It just makes it cheaper to reproduce.

> > Thus a modification of the above logic could be to opportunistically freeze if
> > a ) it won't cause an FPI and either
> > b1) the page was already dirty before pruning, as we'll not do a ringbuffer
> > replacement in that case
> > or
> > b2) We wrote a WAL record during pruning, as the difference in flush position
> > is marginal
> >
> > An even more aggressive version would be to replace b1) with logic that'd
> > allow newly dirtying the page if it wasn't read through the ringbuffer. But
> > newly dirtying the page feels like it'd be more dangerous.
>
> In many cases we'll have to dirty the page anyway, just to set
> PD_ALL_VISIBLE. The whole way the logic works is conditioned (whether
> triggered by an FPI or triggered by my now-reverted GUC) on being able
> to set the whole page all-frozen in the VM.

IIRC setting PD_ALL_VISIBLE doesn't trigger an FPI unless we need to log hint
bits. But freezing does trigger one even without wal_log_hint_bits.

You're right, it makes sense to consider whether we'll emit a
XLOG_HEAP2_VISIBLE anyway.

> > A less aggressive version would be to check if any WAL records were emitted
> > during heap_page_prune() (instead of FPIs) and whether we'd emit an FPI if we
> > modified the page again. Similar to what we do now, except not requiring an
> > FPI to have been emitted.
>
> Also way more aggressive. Not nearly enough on its own.

In which cases will it be problematically more aggressive?

If we emitted a WAL record during pruning we've already set the LSN of the
page to a very recent LSN. We know the page is dirty. Thus we'll already
trigger an XLogFlush() during ringbuffer replacement. We won't emit an FPI.

> > But to me it seems a bit odd that VACUUM now is more aggressive if checksums /
> > wal_log_hint bits is on, than without them. Which I think is how using either
> > of pgWalUsage.wal_fpi, pgWalUsage.wal_records ends up working?
>
> Which part is the odd part? Is it odd that page-level freezing works
> that way, or is it odd that page-level checksums work that way?

That page-level freezing works that way.

> In any case this seems like an odd thing for you to say, having
> eviscerated a patch that really just made the same behavior trigger
> independently of FPIs in some tables, controlled via a GUC.

jdksjfkjdlkajsd;lfkjasd;lkfj;alskdfj

That behaviour I critizied was causing a torrent of FPIs and additional
dirtying of pages. My proposed replacement for the current FPI check doesn't,
because a) it only triggers when we wrote a WAL record b) It doesn't trigger
if we would write an FPI.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brar Piening 2023-01-26 20:48:54 Re: doc: add missing "id" attributes to extension packaging page
Previous Message Tom Lane 2023-01-26 20:43:25 Re: wrong Append/MergeAppend elision?