Re: Custom explain options

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom explain options
Date: 2024-01-13 14:51:20
Message-ID: 50bf548d-44a1-49d8-aac9-6e4988badaae@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/12/24 20:30, Konstantin Knizhnik wrote:
>
> On 12/01/2024 7:03 pm, Tomas Vondra wrote:
>> On 10/21/23 14:16, Konstantin Knizhnik wrote:
>>> Hi hackers,
>>>
>>> EXPLAIN statement has a list of options (i.e. ANALYZE, BUFFERS,
>>> COST,...) which help to provide useful details of query execution.
>>> In Neon we have added PREFETCH option which shows information about page
>>> prefetching during query execution (prefetching is more critical for
>>> Neon
>>> architecture because of separation of compute and storage, so it is
>>> implemented not only for bitmap heap scan as in Vanilla Postgres, but
>>> also for seqscan, indexscan and indexonly scan). Another possible
>>> candidate  for explain options is local file cache (extra caching layer
>>> above shared buffers which is used to somehow replace file system cache
>>> in standalone Postgres).
>> Not quite related to this patch about EXPLAIN options, but can you share
>> some details how you implemented prefetching for the other nodes?
>>
>> I'm asking because I've been working on prefetching for index scans, so
>> I'm wondering if there's a better way to do this, or how to do it in a
>> way that would allow neon to maybe leverage that too.
>>
>> regards
>>
> Yes, I am looking at your PR. What we have implemented in Neon is more
> specific to Neon architecture where storage is separated from compute.
> So each page not found in shared buffers has to be downloaded from page
> server. It adds quite noticeable latency, because of network roundtrip.
> While vanilla Postgres can rely on OS file system cache when page is not
> found in shared buffer (access to OS file cache is certainly slower than
> to shared buffers
> because of syscall and copying of page, but performance penaly is not
> very large - less than 15%), Neon has no local files and so has to send
> request to the socket.
>
> This is why we have to perform aggressive prefetching whenever it is
> possible (when it it is possible to predict order of subsequent pages).
> Unlike vanilla Postgres which implements prefetch only for bitmap heap
> scan, we have implemented it for seqscan, index scan, indexonly scan,
> bitmap heap scan, vacuum, pg_prewarm.
> The main difference between Neon prefetch and vanilla Postgres prefetch
> is that first one is backend specific. So each backend prefetches only
> pages which it needs.
> This is why we have to rewrite prefetch for bitmap heap scan, which is
> using `fadvise` and assumes that pages prefetched by one backend in file
> cache, can be used by any other backend.
>

I do understand why prefetching is important in neon (likely more than
for core postgres). I'm interested in how it's actually implemented,
whether it's somehow similar to how my patch does things or in some
different (perhaps neon-specific way), and if the approaches are
different then what are the pros/cons. And so on.

So is it implemented in the neon-specific storage, somehow, or where/how
does neon issue the prefetch requests?

>
> Concerning index scan we have implemented two different approaches: for
> index only scan we  try to prefetch leave pages and for index scan we
> prefetch referenced heap pages.

In my experience the IOS handling (only prefetching leaf pages) is very
limiting, and may easily lead to index-only scans being way slower than
regular index scans. Which is super surprising for users. It's why I
ended up improving the prefetcher to optionally look at the VM etc.

> In both cases we start from prefetch distance 0 and increase it until it
> reaches `effective_io_concurrency` for this relation. Doing so we try to
> avoid prefetching of useless pages and slowdown of "point" lookups
> returning one or few records.
>

Right, the regular prefetch ramp-up. My patch does the same thing.

> If you are interested, you can look at our implementation in neon repo:
> all source are available. But briefly speaking, each backend has its own
> prefetch ring (prefetch requests which are waiting for response). The
> key idea is that we can send several prefetch requests to page server
> and then receive multiple replies. It allows to increased speed of OLAP
> queries up to 10 times.
>

Can you point me to the actual code / branch where it happens? I did
check the github repo, but I don't see anything relevant in the default
branch (REL_15_STABLE_neon). There are some "prefetch" branches, but
those seem abandoned.

> Heikki thinks that prefetch can be somehow combined with async-io
> proposal (based on io_uring). But right now they have nothing in common.
>

I can imagine async I/O being useful here, but I find the flow of I/O
requests is quite complex / goes through multiple layers. Or maybe I
just don't understand how it should work.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2024-01-13 15:00:03 Re: POC: GROUP BY optimization
Previous Message Jelte Fennema-Nio 2024-01-13 14:37:31 Re: [PATCH] Add additional extended protocol commands to psql: \parse and \bindx