Re: Avoiding unnecessary clog lookups while freezing

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoiding unnecessary clog lookups while freezing
Date: 2022-12-29 00:20:17
Message-ID: 20221229002017.2mnbsd6cjyxscuye@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-12-28 15:24:28 -0800, Peter Geoghegan wrote:
> I took another look at the code coverage situation around freezing
> following pushing the page-level freezing patch earlier today. I
> spotted an issue that I'd missed up until now: certain sanity checks
> in heap_prepare_freeze_tuple() call TransactionIdDidCommit() more
> often than really seems necessary.
>
> Theoretically this is an old issue that dates back to commit
> 699bf7d05c, as opposed to an issue in the page-level freezing patch.
> But that's not really true in a real practical sense. In practice the
> calls to TransactionIdDidCommit() will happen far more frequently
> following today's commit 1de58df4fe (since we're using OldestXmin as
> the cutoff that gates performing freeze_xmin/freeze_xmax processing --
> not FreezeLimit).

Hm. But we still only do the check when we actually freeze, rather than just
during the pre-check in heap_tuple_should_freeze(). So we'll only incur the
increased overhead when we also do more WAL logging etc. Correct?

> I took another look at the code coverage situation around freezing
> I see no reason why we can't just condition the XID sanity check calls
> to TransactionIdDidCommit() (for both the freeze_xmin and the
> freeze_xmax callsites) on a cheap tuple hint bit precheck not being
> enough. We only need an expensive call to TransactionIdDidCommit()
> when the precheck doesn't immediately indicate that the tuple xmin
> looks committed when that's what the sanity check expects to see (or
> that the tuple's xmax looks aborted, in the case of the callsite where
> that's what we expect to see).

Hm. I dimply recall that we had repeated cases where the hint bits were set
wrongly due to some of the multixact related bugs. I think I was trying to be
paranoid about not freezing stuff in those situations, since it can lead to
reviving dead tuples, which obviously is bad.

> Attached patch shows what I mean. A quick run of the standard
> regression tests (with custom instrumentation) shows that this patch
> eliminates 100% of all relevant calls to TransactionIdDidCommit(), for
> both the freeze_xmin and the freeze_xmax callsites.

There's practically no tuple-level concurrent activity in a normal regression
test. So I'm a bit doubtful that is an interesting metric. It'd be a tad more
interesting to run tests, including isolationtester and pgbench, against a
running cluster.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-12-29 00:35:38 windows/meson cfbot warnings
Previous Message Tom Lane 2022-12-29 00:05:35 Re: perl 5.36, C99, -Wdeclaration-after-statement -Wshadow=compatible-local