Re: BitmapHeapScan streaming read user and prelim refactoring

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-03-02 23:39:31
Message-ID: CAAKRu_YEmhqKRwqaWg4qLAa1_W7X=g4v6Ljs9YPSPKwHpicO+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 2, 2024 at 5:51 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 3/2/24 23:11, Melanie Plageman wrote:
> > On Fri, Mar 1, 2024 at 2:31 PM Melanie Plageman
> > <melanieplageman(at)gmail(dot)com> wrote:
> >>
> >> ...
> >>
> >> Hold the phone on this one. I realized why I moved
> >> BitmapAdjustPrefetchIterator after table_scan_bitmap_next_block() in
> >> the first place -- master calls BitmapAdjustPrefetchIterator after the
> >> tbm_iterate() for the current block -- otherwise with eic = 1, it
> >> considers the prefetch iterator behind the current block iterator. I'm
> >> going to go through and figure out what order this must be done in and
> >> fix it.
> >
> > So, I investigated this further, and, as far as I can tell, for
> > parallel bitmapheapscan the timing around when workers decrement
> > prefetch_pages causes the performance differences with patch 0010
> > applied. It makes very little sense to me, but some of the queries I
> > borrowed from your regression examples are up to 30% slower when this
> > code from BitmapAdjustPrefetchIterator() is after
> > table_scan_bitmap_next_block() instead of before it.
> >
> > SpinLockAcquire(&pstate->mutex);
> > if (pstate->prefetch_pages > 0)
> > pstate->prefetch_pages--;
> > SpinLockRelease(&pstate->mutex);
> >
> > I did some stracing and did see much more time spent in futex/wait
> > with this code after the call to table_scan_bitmap_next_block() vs
> > before it. (table_scan_bitmap_next_block()) calls ReadBuffer()).
> >
> > In my branch, I've now moved only the parallel prefetch_pages-- code
> > to before table_scan_bitmap_next_block().
> > https://github.com/melanieplageman/postgres/tree/bhs_pgsr
> > I'd be interested to know if you see the regressions go away with 0010
> > applied (commit message "Make table_scan_bitmap_next_block() async
> > friendly" and sha bfdcbfee7be8e2c461).
> >
>
> I'll give this a try once the runs with MAX_BUFFERS_PER_TRANSFER=1
> complete. But it seems really bizarre that simply moving this code a
> little bit would cause such a regression ...

Yes, it is bizarre. It also might not be a reproducible performance
difference on the cases besides the one I was testing (cyclic dataset,
uncached, eic=8, matches 16+, distinct=100, rows=100000000, 4 parallel
workers). But even if it only affects that one case, it still had a
major, reproducible performance impact to move those 5 lines before
and after table_scan_bitmap_next_block().

The same number of reads and fadvises are being issued overall.
However, I did notice that the pread calls are skewed when the those
lines of code are after table_scan_bitmap_next_block() -- fewer of
the workers are doing more of the reads. Perhaps this explains what is
taking longer. Why those workers would end up doing more of the reads,
I don't quite know.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-03-02 23:46:59 Re: Synchronizing slots from primary to standby
Previous Message Melanie Plageman 2024-03-02 23:07:48 Re: Streaming read-ready sequential scan code