Re: Streaming read-ready sequential scan code

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Streaming read-ready sequential scan code
Date: 2024-04-05 17:55:37
Message-ID: CAAKRu_Z4hjfHvqomhjr0McYPYbtw7iW-5-gAyFCbLOfzTCOrbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 5, 2024 at 12:15 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> Yeah, I plead benchmarking myopia, sorry. The fastpath as committed
> is only reached when distance goes 2->1, as pg_prewarm does. Oops.
> With the attached minor rearrangement, it works fine. I also poked
> some more at that memory prefetcher. Here are the numbers I got on a
> desktop system (Intel i9-9900 @ 3.1GHz, Linux 6.1, turbo disabled,
> cpufreq governor=performance, 2MB huge pages, SB=8GB, consumer NMVe,
> GCC -O3).
>
> create table t (i int, filler text) with (fillfactor=10);
> insert into t
> select g, repeat('x', 900) from generate_series(1, 560000) g;
> vacuum freeze t;
> set max_parallel_workers_per_gather = 0;
>
> select count(*) from t;
>
> cold = must be read from actual disk (Linux drop_caches)
> warm = read from linux page cache
> hot = already in pg cache via pg_prewarm
>
> cold warm hot
> master 2479ms 886ms 200ms
> seqscan 2498ms 716ms 211ms <-- regression
> seqscan + fastpath 2493ms 711ms 200ms <-- fixed, I think?
> seqscan + memprefetch 2499ms 716ms 182ms
> seqscan + fastpath + memprefetch 2505ms 710ms 170ms <-- \O/
>
> Cold has no difference. That's just my disk demonstrating Linux RA at
> 128kB (default); random I/O is obviously a more interesting story.
> It's consistently a smidgen faster with Linux RA set to 2MB (as in
> blockdev --setra 4096 /dev/nvmeXXX), and I believe this effect
> probably also increases on fancier faster storage than what I have on
> hand:
>
> cold
> master 1775ms
> seqscan + fastpath + memprefetch 1700ms
>
> Warm is faster as expected (fewer system calls schlepping data
> kernel->userspace).
>
> The interesting column is hot. The 200ms->211ms regression is due to
> the extra bookkeeping in the slow path. The rejiggered fastpath code
> fixes it for me, or maybe sometimes shows an extra 1ms. Phew. Can
> you reproduce that?

I am able to reproduce the fast path solving the issue using Heikki's
example here [1] but in shared buffers (hot).

master: 25 ms
stream read: 29 ms
stream read + fast path: 25 ms

I haven't looked into or reviewed the memory prefetching part.

While reviewing 0002, I realized that I don't quite see how
read_stream_get_block() will be used in the fastpath -- which it
claims in its comments.
read_stream_next_buffer() is the only caller of
read_stream_look_ahead()->read_stream_get_block(), and if fast_path is
true, read_stream_next_buffer() always returns before calling
read_stream_look_ahead(). Maybe I am missing something. I see
fast_path uses read_stream_fill_blocknums() to invoke the callback.

Oh and why does READ_STREAM_DISABLE_FAST_PATH macro exist?

Otherwise 0002 looks good to me.

I haven't reviewed 0003 or 0004. I attached a new version (v11)
because I noticed an outdated comment in my seq scan streaming read
user patch (0001). The other patches in the set are untouched from
your versions besides adding author/reviewer info in commit message
for 0002.

- Melanie

[1] https://www.postgresql.org/message-id/3b0f3701-addd-4629-9257-cf28e1a6e6a1%40iki.fi

Attachment Content-Type Size
v11-0003-Add-pg_prefetch_mem-macro-to-load-cache-lines.patch text/x-patch 4.7 KB
v11-0002-Improve-read_stream.c-s-fast-path.patch text/x-patch 4.9 KB
v11-0001-Use-streaming-IO-in-heapam-sequential-and-TID-ra.patch text/x-patch 7.2 KB
v11-0004-Prefetch-page-header-memory-when-streaming-relat.patch text/x-patch 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Regina Obe 2024-04-05 17:59:26 RE: Can't compile PG 17 (master) from git under Msys2 autoconf
Previous Message Alvaro Herrera 2024-04-05 17:55:32 Re: LogwrtResult contended spinlock