Re: Prefetch the next tuple's memory during seqscans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Prefetch the next tuple's memory during seqscans
Date: 2023-04-04 04:50:09
Message-ID: CAApHDvqaoq7bY6OcJk3pwA+XL8OAWoQXcm-gcrs2xUvCso39Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 4 Apr 2023 at 07:47, Gregory Stark (as CFM) <stark(dot)cfm(at)gmail(dot)com> wrote:
> The referenced patch was committed March 19th but there's been no
> comment here. Is this patch likely to go ahead this release or should
> I move it forward again?

Thanks for the reminder on this.

I have done some work on it but just didn't post it here as I didn't
have good news. The problem I'm facing is that after Melanie's recent
refactor work done around heapgettup() [1], I can no longer get the
same speedup as before with the pg_prefetch_mem(). While testing
Melanie's patches, I did do some performance tests and did see a good
increase in performance from it. I really don't know the reason why
the prefetching does not show the gains as it did before. Perhaps the
rearranged code is better able to perform hardware prefetching of
cache lines.

I am, however, inclined not to drop the pg_prefetch_mem() macro
altogether just because I can no longer demonstrate any performance
gains during sequential scans, so I decided to go and try what Thomas
mentioned in [2] to use the prefetching macro to fetch the required
tuples in PageRepairFragmentation() so that they're cached in CPU
cache by the time we get to compactify_tuples().

I tried this using the same test as I described in [3] after adjusting
the following line to use PANIC instead of LOG:

ereport(LOG,
(errmsg("redo done at %X/%X system usage: %s",
LSN_FORMAT_ARGS(xlogreader->ReadRecPtr),
pg_rusage_show(&ru0))));

doing that allows me to repeat the test using the same WAL each time.

amd3990x CPU on Ubuntu 22.10 with 64GB RAM.

shared_buffers = 10GB
checkpoint_timeout = '1 h'
max_wal_size = 100GB
max_connections = 300

Master:

2023-04-04 15:54:55.635 NZST [15958] PANIC: redo done at 0/DC447610
system usage: CPU: user: 44.46 s, system: 0.97 s, elapsed: 45.45 s
2023-04-04 15:56:33.380 NZST [16109] PANIC: redo done at 0/DC447610
system usage: CPU: user: 43.80 s, system: 0.86 s, elapsed: 44.69 s
2023-04-04 15:57:25.968 NZST [16134] PANIC: redo done at 0/DC447610
system usage: CPU: user: 44.08 s, system: 0.74 s, elapsed: 44.84 s
2023-04-04 15:58:53.820 NZST [16158] PANIC: redo done at 0/DC447610
system usage: CPU: user: 44.20 s, system: 0.72 s, elapsed: 44.94 s

Prefetch Memory in PageRepairFragmentation():

2023-04-04 16:03:16.296 NZST [25921] PANIC: redo done at 0/DC447610
system usage: CPU: user: 41.73 s, system: 0.77 s, elapsed: 42.52 s
2023-04-04 16:04:07.384 NZST [25945] PANIC: redo done at 0/DC447610
system usage: CPU: user: 40.87 s, system: 0.86 s, elapsed: 41.74 s
2023-04-04 16:05:01.090 NZST [25968] PANIC: redo done at 0/DC447610
system usage: CPU: user: 41.20 s, system: 0.72 s, elapsed: 41.94 s
2023-04-04 16:05:49.235 NZST [25996] PANIC: redo done at 0/DC447610
system usage: CPU: user: 41.56 s, system: 0.66 s, elapsed: 42.24 s

About 6.7% performance increase over master.

I wonder since I really just did the seqscan patch as a means to get
the pg_prefetch_mem() patch in, I wonder if it's ok to scrap that in
favour of the PageRepairFragmentation patch.

Updated patches attached.

David

[1] https://postgr.es/m/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ%40mail.gmail.com
[2] https://postgr.es/m/CA%2BhUKGJRtzbbhVmb83vbCiMRZ4piOAi7HWLCqs%3DGQ74mUPrP_w%40mail.gmail.com
[3] https://postgr.es/m/CAApHDvoKwqAzhiuxEt8jSquPJKDpH8DNUZDFUSX9P7DXrJdc3Q%40mail.gmail.com

Attachment Content-Type Size
v1-0001-Add-pg_prefetch_mem-macro-to-load-cache-lines.patch application/octet-stream 5.3 KB
prefetch_in_PageRepairFragmentation.patch application/octet-stream 488 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tender wang 2023-04-04 05:53:52 Re: same query but different result on pg16devel and pg15.2
Previous Message Hayato Kuroda (Fujitsu) 2023-04-04 04:18:53 RE: Add missing copyright for pg_upgrade/t/* files