Re: BitmapHeapScan streaming read user and prelim refactoring

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-03-01 19:31:41
Message-ID: CAAKRu_Yos+65UrYcRtF3GQARfvrd8qgp5dO4p=1JzSaP=7cqog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 29, 2024 at 7:29 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Thu, Feb 29, 2024 at 5:44 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >
> >
> >
> > On 2/29/24 22:19, Melanie Plageman wrote:
> > > On Thu, Feb 29, 2024 at 7:54 AM Tomas Vondra
> > > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > >>
> > >>
> > >>
> > >> On 2/29/24 00:40, Melanie Plageman wrote:
> > >>> On Wed, Feb 28, 2024 at 6:17 PM Tomas Vondra
> > >>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 2/28/24 21:06, Melanie Plageman wrote:
> > >>>>> On Wed, Feb 28, 2024 at 2:23 PM Tomas Vondra
> > >>>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> > >>>>>>
> > >>>>>> On 2/28/24 15:56, Tomas Vondra wrote:
> > >>>>>>>> ...
> > >>>>>>>
> > >>>>>>> Sure, I can do that. It'll take a couple hours to get the results, I'll
> > >>>>>>> share them when I have them.
> > >>>>>>>
> > >>>>>>
> > >>>>>> Here are the results with only patches 0001 - 0012 applied (i.e. without
> > >>>>>> the patch introducing the streaming read API, and the patch switching
> > >>>>>> the bitmap heap scan to use it).
> > >>>>>>
> > >>>>>> The changes in performance don't disappear entirely, but the scale is
> > >>>>>> certainly much smaller - both in the complete results for all runs, and
> > >>>>>> for the "optimal" runs that would actually pick bitmapscan.
> > >>>>>
> > >>>>> Hmm. I'm trying to think how my refactor could have had this impact.
> > >>>>> It seems like all the most notable regressions are with 4 parallel
> > >>>>> workers. What do the numeric column labels mean across the top
> > >>>>> (2,4,8,16...) -- are they related to "matches"? And if so, what does
> > >>>>> that mean?
> > >>>>>
> > >>>>
> > >>>> That's the number of distinct values matched by the query, which should
> > >>>> be an approximation of the number of matching rows. The number of
> > >>>> distinct values in the data set differs by data set, but for 1M rows
> > >>>> it's roughly like this:
> > >>>>
> > >>>> uniform: 10k
> > >>>> linear: 10k
> > >>>> cyclic: 100
> > >>>>
> > >>>> So for example matches=128 means ~1% of rows for uniform/linear, and
> > >>>> 100% for cyclic data sets.
> > >>>
> > >>> Ah, thank you for the explanation. I also looked at your script after
> > >>> having sent this email and saw that it is clear in your script what
> > >>> "matches" is.
> > >>>
> > >>>> As for the possible cause, I think it's clear most of the difference
> > >>>> comes from the last patch that actually switches bitmap heap scan to the
> > >>>> streaming read API. That's mostly expected/understandable, although we
> > >>>> probably need to look into the regressions or cases with e_i_c=0.
> > >>>
> > >>> Right, I'm mostly surprised about the regressions for patches 0001-0012.
> > >>>
> > >>> Re eic 0: Thomas Munro and I chatted off-list, and you bring up a
> > >>> great point about eic 0. In old bitmapheapscan code eic 0 basically
> > >>> disabled prefetching but with the streaming read API, it will still
> > >>> issue fadvises when eic is 0. That is an easy one line fix. Thomas
> > >>> prefers to fix it by always avoiding an fadvise for the last buffer in
> > >>> a range before issuing a read (since we are about to read it anyway,
> > >>> best not fadvise it too). This will fix eic 0 and also cut one system
> > >>> call from each invocation of the streaming read machinery.
> > >>>
> > >>>> To analyze the 0001-0012 patches, maybe it'd be helpful to run tests for
> > >>>> individual patches. I can try doing that tomorrow. It'll have to be a
> > >>>> limited set of tests, to reduce the time, but might tell us whether it's
> > >>>> due to a single patch or multiple patches.
> > >>>
> > >>> Yes, tomorrow I planned to start trying to repro some of the "red"
> > >>> cases myself. Any one of the commits could cause a slight regression
> > >>> but a 3.5x regression is quite surprising, so I might focus on trying
> > >>> to repro that locally and then narrow down which patch causes it.
> > >>>
> > >>> For the non-cached regressions, perhaps the commit to use the correct
> > >>> recheck flag (0004) when prefetching could be the culprit. And for the
> > >>> cached regressions, my money is on the commit which changes the whole
> > >>> control flow of BitmapHeapNext() and the next_block() and next_tuple()
> > >>> functions (0010).
> > >>>
> > >>
> > >> I do have some partial results, comparing the patches. I only ran one of
> > >> the more affected workloads (cyclic) on the xeon, attached is a PDF
> > >> comparing master and the 0001-0014 patches. The percentages are timing
> > >> vs. the preceding patch (green - faster, red - slower).
> > >
> > > Just confirming: the results are for uncached?
> > >
> >
> > Yes, cyclic data set, uncached case. I picked this because it seemed
> > like one of the most affected cases. Do you want me to test some other
> > cases too?
>
> So, I actually may have found the source of at least part of the
> regression with 0010. I was able to reproduce the regression with
> patch 0010 applied for the unached case with 4 workers and eic 8 and
> 100000000 rows for the cyclic dataset. I see it for all number of
> matches. The regression went away (for this specific example) when I
> moved the BitmapAdjustPrefetchIterator call back up to before the call
> to table_scan_bitmap_next_block() like this:
>
> diff --git a/src/backend/executor/nodeBitmapHeapscan.c
> b/src/backend/executor/nodeBitmapHeapscan.c
> index f7ecc060317..268996bdeea 100644
> --- a/src/backend/executor/nodeBitmapHeapscan.c
> +++ b/src/backend/executor/nodeBitmapHeapscan.c
> @@ -279,6 +279,8 @@ BitmapHeapNext(BitmapHeapScanState *node)
> }
>
> new_page:
> + BitmapAdjustPrefetchIterator(node, node->blockno);
> +
> if (!table_scan_bitmap_next_block(scan, &node->recheck,
> &lossy, &node->blockno))
> break;
>
> @@ -287,7 +289,6 @@ new_page:
> else
> node->exact_pages++;
>
> - BitmapAdjustPrefetchIterator(node, node->blockno);
> /* Adjust the prefetch target */
> BitmapAdjustPrefetchTarget(node);
> }
>
> It makes sense this would fix it. I haven't tried all the combinations
> you tried. Do you mind running your tests with the new code? I've
> pushed it into this branch.
> https://github.com/melanieplageman/postgres/commits/bhs_pgsr/

Hold the phone on this one. I realized why I moved
BitmapAdjustPrefetchIterator after table_scan_bitmap_next_block() in
the first place -- master calls BitmapAdjustPrefetchIterator after the
tbm_iterate() for the current block -- otherwise with eic = 1, it
considers the prefetch iterator behind the current block iterator. I'm
going to go through and figure out what order this must be done in and
fix it.

- Melanie

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Jungwirth 2024-03-01 20:38:27 Re: SQL:2011 application time
Previous Message Cary Huang 2024-03-01 19:14:43 Re: [Patch] add multiple client certificate selection feature