Streaming read-ready sequential scan code

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Streaming read-ready sequential scan code
Date: 2024-01-29 21:17:24
Message-ID: CAAKRu_YtXJiYKQvb5JsA2SkwrsizYLugs4sSOZh3EAjKUg=gEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Last year, David and I worked on a round of refactoring for
heapgettup() and heapgettup_pagemode() [1]. Now that the streaming
read API has been proposed [2], there is a bit more refactoring that
can be done on master to prepare sequential scan to support streaming
reads.

Patches 0001 and 0002 in the attached patchset do this new round of
refactoring. 0003 is the remainder of the streaming read API that is
not yet in master. 0004 is the sequential scan streaming read user.

The primary change needed to be able to drop in streaming read support
was that heapgettup() and heapgettup_pagemode() have to wait for there
to be no more valid buffers instead of waiting until there were no
more valid BlockNumbers to know that the relation has been entirely
processed. Naturally, streaming reads prefetch ahead of the block
being currently processed by the scan, so all blocks should have been
requested long before all blocks have been processed.

To change this, I split up heapgetpage() into two functions -- one
responsible for getting blocks into buffers and the other for
processing a page (pruning, checking tuple visibility, etc). As a
consequence, I had to change the other caller of heapgetpage() (sample
scans). Since I was doing this anyway, I made a few changes there. It
is arguable that those changes could be split up differently between
0001 and 0004. However, I wanted 0004 to be *only* the sequential scan
streaming read user code.

There is an outstanding question about where to allocate the
PgStreamingRead object for sequential scans (see TODO in 0004).
However, I thought I would keep this thread focused on 0001 and 0002.

Though logically the performance with 0001 and 0002 should be the same
as master (no new non-inline function calls, no additional looping),
I've done a bit of profiling anyway. I created a large multi-GB table,
read it all into shared buffers (disabling the large sequential scan
bulkread optimization), and did a sequential SELECT count(*) from the
table. From the profiles below, you'll notice that master and the
patch are basically the same. Actual percentages vary from run-to-run.
Execution time is the same.

patch
15.49% postgres postgres [.] ExecInterpExpr
11.03% postgres postgres [.] heapgettup_pagemode
10.85% postgres postgres [.] ExecStoreBufferHeapTuple
9.14% postgres postgres [.] heap_getnextslot
8.39% postgres postgres [.] heapbuildvis
6.47% postgres postgres [.] SeqNext

master
14.16% postgres postgres [.] ExecInterpExpr
11.54% postgres postgres [.] heapgettup_pagemode
10.63% postgres postgres [.] ExecStoreBufferHeapTuple
10.22% postgres postgres [.] heap_getnextslot
8.53% postgres postgres [.] heapgetpage
5.35% postgres postgres [.] SeqNext

- Melanie

[1] https://www.postgresql.org/message-id/flat/CAAKRu_YSOnhKsDyFcqJsKtBSrd32DP-jjXmv7hL0BPD-z0TGXQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CA%2BhUKGJkOiOCa%2Bmag4BF%2BzHo7qo%3Do9CFheB8%3Dg6uT5TUm2gkvA%40mail.gmail.com

Attachment Content-Type Size
v1-0002-Replace-blocks-with-buffers-in-heapgettup-control.patch text/x-patch 7.6 KB
v1-0003-Streaming-Read-API.patch text/x-patch 56.0 KB
v1-0004-Sequential-scans-support-streaming-read.patch text/x-patch 7.4 KB
v1-0001-Split-heapgetpage-into-two-parts.patch text/x-patch 8.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-01-29 21:24:36 Re: Streaming read-ready sequential scan code
Previous Message Nathan Bossart 2024-01-29 21:13:21 Re: cleanup patches for incremental backup