Re: BitmapHeapScan streaming read user and prelim refactoring

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-03-14 20:58:09
Message-ID: CA+hUKGLvKtf6sGo-YbW8cOf+SP6w90202Ut2ZVz10V+Fj0+KTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 15, 2024 at 3:18 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> So, IIUC this means (1) the patched code is more aggressive wrt
> prefetching (because we prefetch more data overall, because master would
> prefetch N pages and patched prefetches N ranges, each of which may be
> multiple pages. And (2) it's not easy to quantify how much more
> aggressive it is, because it depends on how we happen to coalesce the
> pages into ranges.
>
> Do I understand this correctly?

Yes.

Parallelism must prevent coalescing here though. Any parallel aware
executor node that allocates block numbers to workers without trying
to preserve ranges will. That not only hides the opportunity to
coalesce reads, it also makes (globally) sequential scans look random
(ie locally they are more random), so that our logic to avoid issuing
advice for sequential scan won't work, and we'll inject extra useless
or harmful (?) fadvise calls. I don't know what to do about that yet,
but it seems like a subject for future research. Should we recognise
sequential scans with a window (like Linux does), instead of strictly
next-block detection (like some other OSes do)? Maybe a shared
streaming read that all workers pull blocks from, so it can see what's
going on? I think the latter would be strictly more like what the ad
hoc BHS prefetching code in master is doing, but I don't know if it'd
be over-engineering, or hard to do for some reason.

Another aspect of per-backend streaming reads in one parallel query
that don't know about each other is that they will all have their own
effective_io_concurrency limit. That is a version of a problem that
comes up again and again in parallel query, to be solved by the grand
unified resource control system of the future.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-03-14 20:59:50 Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
Previous Message Tom Lane 2024-03-14 20:56:39 Re: Recent 027_streaming_regress.pl hangs