Re: BitmapHeapScan streaming read user and prelim refactoring

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-03-18 15:55:12
Message-ID: ef92e929-fa35-4fea-acc2-259486a08bba@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/18/24 15:47, Melanie Plageman wrote:
> On Sun, Mar 17, 2024 at 3:21 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> On 3/14/24 22:39, Melanie Plageman wrote:
>>> On Thu, Mar 14, 2024 at 5:26 PM Tomas Vondra
>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>
>>>> On 3/14/24 19:16, Melanie Plageman wrote:
>>>>> On Thu, Mar 14, 2024 at 03:32:04PM +0200, Heikki Linnakangas wrote:
>>>>>> ...
>>>>>>
>>>>>> Ok, committed that for now. Thanks for looking!
>>>>>
>>>>> Attached v6 is rebased over your new commit. It also has the "fix" in
>>>>> 0010 which moves BitmapAdjustPrefetchIterator() back above
>>>>> table_scan_bitmap_next_block(). I've also updated the Streaming Read API
>>>>> commit (0013) to Thomas' v7 version from [1]. This has the update that
>>>>> we theorize should address some of the regressions in the bitmapheapscan
>>>>> streaming read user in 0014.
>>>>>
>>>>
>>>> Should I rerun the benchmarks with these new patches, to see if it
>>>> really helps with the regressions?
>>>
>>> That would be awesome!
>>>
>>
>> OK, here's a couple charts comparing the effect of v6 patches to master.
>> These are from 1M and 10M data sets, same as the runs presented earlier
>> in this thread (the 10M is still running, but should be good enough for
>> this kind of visual comparison).
>
> Thanks for doing this!
>
>> What is even more obvious is that 0014 behaves *VERY* differently. I'm
>> not sure if this is a good thing or a problem is debatable/unclear. I'm
>> sure we don't want to cause regressions, but perhaps those are due to
>> the prefetch issue discussed elsewhere in this thread (identified by
>> Andres and Melanie). There are also many cases that got much faster, but
>> the question is whether this is due to better efficiency or maybe the
>> new code being more aggressive in some way (not sure).
>
> Are these with the default effective_io_concurrency (1)? If so, the
> "effective" prefetch distance in many cases will be higher with the
> streaming read code applied. With effective_io_concurrency 1,
> "max_ios" will always be 1, but the number of blocks prefetched may
> exceed this (up to MAX_BUFFERS_PER_TRANSFER) because the streaming
> read code is always trying to build bigger IOs. And, if prefetching,
> it will prefetch IOs not yet in shared buffers before reading them.
>

No, it's a mix of runs with random combinations of these parameters:

dataset: uniform uniform_pages linear linear_fuzz cyclic cyclic_fuzz
workers: 0 4
work_mem: 128kB 4MB 64MB
eic: 0 1 8 16 32
selectivity: 0-100%

I can either share the data (~70MB of CSV) or generate charts for
results with some filter.

> It's hard to tell without going into a specific repro why this would
> cause some queries to be much slower. In the forced bitmapheapscan, it
> would make sense that more prefetching is worse -- which is why a
> bitmapheapscan plan wouldn't have been chosen. But in the optimal
> cases, it is unclear why it would be worse.
>

Yes, not sure about the optimal cases. I'll wait for the 10M runs to
complete, and then we can look for some patterns.

> I don't think there is any way it could be the issue Andres
> identified, because there is only one iterator. Nothing to get out of
> sync. It could be that the fadvises are being issued too close to the
> reads and aren't effective enough at covering up read latency on
> slower, older hardware. But that doesn't explain why master would
> sometimes be faster.
>

Ah, right, thanks for the clarification. I forgot the streaming read API
does not use the two-iterator approach.

> Probably the only thing we can do is get into a repro. It would, of
> course, be easiest to do this with a serial query. I can dig into the
> scripts you shared earlier and try to find a good repro. Because the
> regressions may have shifted with Thomas' new version, it would help
> if you shared a category (cyclic/uniform/etc, parallel or serial, eic
> value, work mem, etc) where you now see the most regressions.
>

OK, I've restarted the tests for only 0012 and 0014 patches, and I'll
wait for these to complete - I don't want to be looking for patterns
until we have enough data to smooth this out.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sameer M. Deshpande 2024-03-18 15:57:53 libpq behavior with hostname with multiple addresses and target_session_attrs=primary
Previous Message Noah Misch 2024-03-18 15:49:34 Re: Autogenerate some wait events code and documentation