Re: BitmapHeapScan streaming read user and prelim refactoring

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Date: 2024-04-06 00:51:45
Message-ID: a76f657a-5ce9-4467-b832-356c881db981@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/6/24 01:53, Melanie Plageman wrote:
> On Fri, Apr 05, 2024 at 04:06:34AM -0400, Melanie Plageman wrote:
>> On Thu, Apr 04, 2024 at 04:35:45PM +0200, Tomas Vondra wrote:
>>>
>>>
>>> On 4/4/24 00:57, Melanie Plageman wrote:
>>>> On Sun, Mar 31, 2024 at 11:45:51AM -0400, Melanie Plageman wrote:
>>>>> On Fri, Mar 29, 2024 at 12:05:15PM +0100, Tomas Vondra wrote:
>>>>>>
>>>>>>
>>>>>> On 3/29/24 02:12, Thomas Munro wrote:
>>>>>>> On Fri, Mar 29, 2024 at 10:43 AM Tomas Vondra
>>>>>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>>>>> I think there's some sort of bug, triggering this assert in heapam
>>>>>>>>
>>>>>>>> Assert(BufferGetBlockNumber(hscan->rs_cbuf) == tbmres->blockno);
>>>>>>>
>>>>>>> Thanks for the repro. I can't seem to reproduce it (still trying) but
>>>>>>> I assume this is with Melanie's v11 patch set which had
>>>>>>> v11-0016-v10-Read-Stream-API.patch.
>>>>>>>
>>>>>>> Would you mind removing that commit and instead applying the v13
>>>>>>> stream_read.c patches[1]? v10 stream_read.c was a little confused
>>>>>>> about random I/O combining, which I fixed with a small adjustment to
>>>>>>> the conditions for the "if" statement right at the end of
>>>>>>> read_stream_look_ahead(). Sorry about that. The fixed version, with
>>>>>>> eic=4, with your test query using WHERE a < a, ends its scan with:
>>>>>>>
>>>>>>
>>>>>> I'll give that a try. Unfortunately unfortunately the v11 still has the
>>>>>> problem I reported about a week ago:
>>>>>>
>>>>>> ERROR: prefetch and main iterators are out of sync
>>>>>>
>>>>>> So I can't run the full benchmarks :-( but master vs. streaming read API
>>>>>> should work, I think.
>>>>>
>>>>> Odd, I didn't notice you reporting this ERROR popping up. Now that I
>>>>> take a look, v11 (at least, maybe also v10) had this very sill mistake:
>>>>>
>>>>>   if (scan->bm_parallel == NULL &&
>>>>>       scan->rs_pf_bhs_iterator &&
>>>>>       hscan->pfblockno > hscan->rs_base.blockno)
>>>>>        elog(ERROR, "prefetch and main iterators are out of sync");
>>>>>
>>>>> It errors out if the prefetch block is ahead of the current block --
>>>>> which is the opposite of what we want. I've fixed this in attached v12.
>>>>>
>>>>> This version also has v13 of the streaming read API. I noticed one
>>>>> mistake in my bitmapheap scan streaming read user -- it freed the
>>>>> streaming read object at the wrong time. I don't know if this was
>>>>> causing any other issues, but it at least is fixed in this version.
>>>>
>>>> Attached v13 is rebased over master (which includes the streaming read
>>>> API now). I also reset the streaming read object on rescan instead of
>>>> creating a new one each time.
>>>>
>>>> I don't know how much chance any of this has of going in to 17 now, but
>>>> I thought I would start looking into the regression repro Tomas provided
>>>> in [1].
>>>>
>>>
>>> My personal opinion is that we should try to get in as many of the the
>>> refactoring patches as possible, but I think it's probably too late for
>>> the actual switch to the streaming API.
>>
>> Cool. In the attached v15, I have dropped all commits that are related
>> to the streaming read API and included *only* commits that are
>> beneficial to master. A few of the commits are merged or reordered as
>> well.
>>
>> While going through the commits with this new goal in mind (forget about
>> the streaming read API for now), I realized that it doesn't make much
>> sense to just eliminate the layering violation for the current block and
>> leave it there for the prefetch block. I had de-prioritized solving this
>> when I thought we would just delete the prefetch code and replace it
>> with the streaming read.
>>
>> Now that we aren't doing that, I've spent the day trying to resolve the
>> issues with pushing the prefetch code into heapam.c that I cited in [1].
>> 0010 - 0013 are the result of this. They are not very polished yet and
>> need more cleanup and review (especially 0011, which is probably too
>> large), but I am happy with the solution I came up with.
>>
>> Basically, there are too many members needed for bitmap heap scan to put
>> them all in the HeapScanDescData (don't want to bloat it). So, I've made
>> a new BitmapHeapScanDescData and associated begin/rescan/end() functions
>>
>> In the end, with all patches applied, BitmapHeapNext() loops invoking
>> table_scan_bitmap_next_tuple() and table AMs can implement that however
>> they choose.
>>
>>>> I'm also not sure if I should try and group the commits into fewer
>>>> commits now or wait until I have some idea of whether or not the
>>>> approach in 0013 and 0014 is worth pursuing.
>>>>
>>>
>>> You mean whether to pursue the approach in general, or for v17? I think
>>> it looks like the right approach, but for v17 see above :-(
>>>
>>> As for merging, I wouldn't do that. I looked at the commits and while
>>> some of them seem somewhat "trivial", I really like how you organized
>>> the commits, and kept those that just "move" code around, and those that
>>> actually change stuff. It's much easier to understand, IMO.
>>>
>>> I went through the first ~10 commits, and added some review - either as
>>> a separate commit, when possible, in the code as XXX comment, and also
>>> in the commit message. The code tweaks are utterly trivial (whitespace
>>> or indentation to make the line shorter). It shouldn't take much time to
>>> deal with those, I think.
>>
>> Attached v15 incorporates your v14-0002-review.
>>
>> For your v14-0008-review, I actually ended up removing that commit
>> because once I removed everything that was for streaming read API, it
>> became redundant with another commit.
>>
>> For your v14-0010-review, we actually can't easily get rid of those
>> local variables because we make the iterators before we make the scan
>> descriptors and the commit following that commit moves the iterators
>> from the BitmapHeapScanState to the scan descriptor.
>>
>>> I think the main focus should be updating the commit messages. If it was
>>> only a single patch, I'd probably try to write the messages myself, but
>>> with this many patches it'd be great if you could update those and I'll
>>> review that before commit.
>>
>> I did my best to update the commit messages to be less specific and more
>> focused on "why should I care". I found myself wanting to explain why I
>> implemented something the way I did and then getting back into the
>> implementation details again. I'm not sure if I suceeded in having less
>> details and more substance.
>>
>>> I always struggle with writing commit messages myself, and it takes me
>>> ages to write a good one (well, I think the message is good, but who
>>> knows ...). But I think a good message should be concise enough to
>>> explain what and why it's done. It may reference a thread for all the
>>> gory details, but the basic reasoning should be in the commit message.
>>> For example the message for "BitmapPrefetch use prefetch block recheck
>>> for skip fetch" now says that it "makes more sense to do X" but does not
>>> really say why that's the case. The linked message does, but it'd be
>>> good to have that in the message (because how would I know how much of
>>> the thread to read?).
>>
>> I fixed that particular one. I tried to take that feedback and apply it
>> to other commit messages. I don't know how successful I was...
>>
>>> Also, it'd be very helpful if you could update the author & reviewed-by
>>> fields. I'll review those before commit, ofc, but I admit I lost track
>>> of who reviewed which part.
>>
>> I have updated reviewers. I didn't add reviewers on the ones that
>> haven't really been reviewed yet.
>>
>>> I'd focus on the first ~8-9 commits or so for now, we can commit more if
>>> things go reasonably well.
>>
>> Sounds good. I will spend cleanup time on 0010-0013 tomorrow but would
>> love to know if you agree with the direction before I spend more time.
>
> In attached v16, I've split out 0010-0013 into 0011-0017. I think it is
> much easier to understand.
>

Damn it, I went through the whole patch series, adding a couple review
comments and tweaks, and was just about to share my version, but you bet
me to it ;-)

Anyway, I've attached it as .tgz in order to not confuse cfbot. All the
review comments are marked with XXX, so grep for that in the patches.
There's two separate patches - the first one suggests a code change, so
it was better to not merge that with your code. The second has just a
couple XXX comments, I'm not sure why I kept it separate.

A couple review comments:

* I think 0001-0009 are 99% ready to. I reworded some of the commit
messages a bit - I realize it's a bit bold, considering you're native
speaker and I'm not. If you could check I didn't make it worse, that
would be great.

* I'm not sure extra_flags is the right way to pass the flag in 0003.
The "extra_" name is a bit weird, and no other table AM functions do it
this way and pass explicit bool flags instead. So my first "review"
commit does it like that. Do you agree it's better that way?

* The one question I'm somewhat unsure about is why Tom chose to use the
"wrong" recheck flag in the 2017 commit, when the correct recheck flag
is readily available. Surely that had a reason, right? But I can't think
of one ...

> While I was doing that, I realized that I should remove the call to
> table_rescan() from ExecReScanBitmapHeapScan() and just rely on the new
> table_rescan_bm() invoked from BitmapHeapNext(). That is done in the
> attached.
>
> 0010-0018 still need comments updated but I focused on getting the split
> out, reviewable version of them ready. I'll add comments (especially to
> 0011 table AM functions) tomorrow. I also have to double-check if I
> should add any asserts for table AMs about having implemented all of the
> new begin/re/endscan() functions.
>

I added a couple more comments for those patches (10-12). Chances are
the split in v16 clarifies some of my questions, but it'll have to wait
till the morning ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v15-review.tgz application/x-compressed-tar 32.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-04-06 01:07:51 Re: Allow non-superuser to cancel superuser tasks.
Previous Message James Coleman 2024-04-06 00:43:33 Re: Teach predtest about IS [NOT] <boolean> proofs