Re: New WAL record to detect the checkpoint redo location

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New WAL record to detect the checkpoint redo location
Date: 2023-10-05 18:34:00
Message-ID: 20231005183400.n5myso7vu6crd656@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-10-02 10:42:37 -0400, Robert Haas wrote:
> I was trying to think of a test case where XLogInsertRecord would be
> exercised as heavily as possible, so I really wanted to generate a lot
> of WAL while doing as little real work as possible. The best idea that
> I had was to run pg_create_restore_point() in a loop.

What I use for that is pg_logical_emit_message(). Something like

SELECT count(*)
FROM
(
SELECT pg_logical_emit_message(false, '1', 'short'), generate_series(1, 10000)
);

run via pgbench does seem to exercise that path nicely.

> One possible conclusion is that the differences here aren't actually
> big enough to get stressed about, but I don't want to jump to that
> conclusion without investigating the competing hypothesis that this
> isn't the right way to test this, and that some better test would show
> clearer results. Suggestions?

I saw some small differences in runtime running pgbench with the above query,
with a single client. Comparing profiles showed a surprising degree of
difference. That turns out to mostly a consequence of the fact that
ReserveXLogInsertLocation() isn't inlined anymore, because there now are two
callers of the function in XLogInsertRecord().

Unfortunately, I still see a small performance difference after that. To get
the most reproducible numbers, I disable turbo boost, bound postgres to one
cpu core, bound pgbench to another core. Over a few runs I quite reproducibly
get ~319.323 tps with your patches applied (+ always inline), and ~324.674
with master.

If I add an unlikely around if (rechdr->xl_rmid == RM_XLOG_ID), the
performance does improve. But that "only" brings it up to 322.406. Not sure
what the rest is.

One thing that's notable, but not related to the patch, is that we waste a
fair bit of cpu time below XLogInsertRecord() with divisions. I think they're
all due to the use of UsableBytesInSegment in
XLogBytePosToRecPtr/XLogBytePosToEndRecPtr. The multiplication of
XLogSegNoOffsetToRecPtr() also shows.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-10-05 18:39:15 Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Previous Message Tom Lane 2023-10-05 17:37:38 Re: Annoying build warnings from latest Apple toolchain