Re: WIP: WAL prefetch (another approach)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2021-05-03 01:23:25
Message-ID: CA+hUKGJyND_FkaJGmwniR_NiTpK_pPNNHaXGvb7OdjxqHUMOUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 29, 2021 at 12:24 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2021-04-28 19:24:53 -0400, Tom Lane wrote:
> >> IOW, we've spent over twice as many CPU cycles shipping data to the
> >> standby as we did in applying the WAL on the standby.
>
> > I don't really know how the time calculation works on mac. Is there a
> > chance it includes time spent doing IO?

For comparison, on a modern Linux system I see numbers like this,
while running that 025_stream_rep_regress.pl test I posted in a nearby
thread:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
tmunro 2150863 22.5 0.0 55348 6752 ? Ss 12:59 0:07
postgres: standby_1: startup recovering 00000001000000020000003C
tmunro 2150867 17.5 0.0 55024 6364 ? Ss 12:59 0:05
postgres: standby_1: walreceiver streaming 2/3C675D80
tmunro 2150868 11.7 0.0 55296 7192 ? Ss 12:59 0:04
postgres: primary: walsender tmunro [local] streaming 2/3C675D80

Those ratios are better but it's still hard work, and perf shows the
CPU time is all in page cache schlep:

22.44% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string
20.12% postgres [kernel.kallsyms] [k] __add_to_page_cache_locked
7.30% postgres [kernel.kallsyms] [k] iomap_set_page_dirty

That was with all three patches reverted, so it's nothing new.
Definitely room for improvement... there have been a few discussions
about not using a buffered file for high-frequency data exchange and
relaxing various timing rules, which we should definitely look into,
but I wouldn't be at all surprised if HFS+ was just much worse at
this.

Thinking more about good old HFS+... I guess it's remotely possible
that there might have been coherency bugs in that could be exposed by
our usage pattern, but then that doesn't fit too well with the clues I
have from light reading: this is a non-SMP system, and it's said that
HFS+ used to serialise pretty much everything on big filesystem locks
anyway.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-05-03 03:56:45 Re: Remove redundant variable from transformCreateStmt
Previous Message Tom Lane 2021-05-02 23:25:27 Re: pg_upgrade not preserving comments on predefined roles