Re: BitmapHeapScan streaming read user and prelim refactoring

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-03-02 15:05:07
Message-ID: 186bcba1-a0e1-4871-8ed2-0d301901d0ba@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/1/24 18:08, Tomas Vondra wrote:
>
> On 3/1/24 17:51, Melanie Plageman wrote:
>> On Fri, Mar 1, 2024 at 9:05 AM Tomas Vondra
>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>
>>> On 3/1/24 02:18, Melanie Plageman wrote:
>>>> On Thu, Feb 29, 2024 at 6:44 PM Tomas Vondra
>>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>>
>>>>> On 2/29/24 23:44, Tomas Vondra wrote:
>>>>> 1) On master there's clear difference between eic=0 and eic=1 cases, but
>>>>> on the patched build there's literally no difference - for example the
>>>>> "uniform" distribution is clearly not great for prefetching, but eic=0
>>>>> regresses to eic=1 poor behavior).
>>>>
>>>> Yes, so eic=0 and eic=1 are identical with the streaming read API.
>>>> That is, eic 0 does not disable prefetching. Thomas is going to update
>>>> the streaming read API to avoid issuing an fadvise for the last block
>>>> in a range before issuing a read -- which would mean no prefetching
>>>> with eic 0 and eic 1. Not doing prefetching with eic 1 actually seems
>>>> like the right behavior -- which would be different than what master
>>>> is doing, right?
>>>
>>> I don't think we should stop doing prefetching for eic=1, or at least
>>> not based just on these charts. I suspect these "uniform" charts are not
>>> a great example for the prefetching, because it's about distribution of
>>> individual rows, and even a small fraction of rows may match most of the
>>> pages. It's great for finding strange behaviors / corner cases, but
>>> probably not a sufficient reason to change the default.
>>
>> Yes, I would like to see results from a data set where selectivity is
>> more correlated to pages/heap fetches. But, I'm not sure I see how
>> that is related to prefetching when eic = 1.
>>
>
> OK, I'll make that happen.
>

Here's a PDF with charts for a dataset where the row selectivity is more
correlated to selectivity of pages. I'm attaching the updated script,
with the SQL generating the data set. But the short story is all rows on
a single page have the same random value, so the selectivity of rows and
pages should be the same.

The first page has results for the original "uniform", the second page
is the new "uniform-pages" data set. There are 4 charts, for
master/patched and 0/4 parallel workers. Overall the behavior is the
same, but for the "uniform-pages" it's much more gradual (with respect
to row selectivity). I think that's expected.

As for how this is related to eic=1 - I think my point was that these
are "adversary" data sets, most likely to show regressions. This applies
especially to the "uniform" data set, because as the row selectivity
grows, it's more and more likely it's right after to the current one,
and so a read-ahead would likely do the trick.

Also, this is forcing a bitmap scan plan - it's possible many of these
cases would use some other scan type, making the regression somewhat
irrelevant. Not entirely, because we make planning mistakes and for
robustness reasons it's good to keep the regression small.

But that's just how I think about it now. I don't think I have some
grand theory that'd dictate we have to do prefetching for eic=1.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
uniform-pages.pdf application/pdf 493.7 KB
run.sh application/x-shellscript 7.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2024-03-02 15:46:36 Re: RFC: Logging plan of the running query
Previous Message Nikita Malakhov 2024-03-02 12:33:26 Re: Shared detoast Datum proposal