Re: Returning nbtree posting list TIDs in DESC order during backwards scans

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andy Fan <zhihuifan1213(at)163(dot)com>, Mircea Cadariu <cadariu(dot)mircea(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Returning nbtree posting list TIDs in DESC order during backwards scans
Date: 2025-12-03 12:31:43
Message-ID: 9037658C-159A-44B7-973A-093F6F18706A@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Peter,

The patch v4 looks carefully written and technically solid, and the core logic (switching killedItems[] to Bitmapset and reworking backwards posting list scans) is coherent.

I just got a few comments/questions:

> On Dec 3, 2025, at 11:08, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> Attached is v4, which does it that way.
>
> My plan is to commit this improved version in the next couple of days.
>
> --
> Peter Geoghegan
> <v4-0001-Return-TIDs-in-desc-order-during-backwards-scans.patch>

1
```
- /* Always invalidate so->killedItems[] before leaving so->currPos */
- so->numKilled = 0;
```

The old code only sets so->numKilled to 0 and reuse memory of so->killedItems[], now the new code always bms_free(so->killedItems) and re-alloc memory when next time adding a member to bms.

I am think that, if there were bms_clear(), then we could have just cleared the bitmap and reused the memory next time. But unfortunately, there is no such a bms_clear() now. What do you think to add bms_clear() and use it here? I don’t want to do that, I can try that once you push this patch.

2
```
+ /* Set up posting list state (and remember last TID) */
itemIndex--;
tupleOffset =
_bt_setuppostingitems(so, itemIndex, offnum,
- BTreeTupleGetPostingN(itup, 0),
+ BTreeTupleGetPostingN(itup, nitems - 1),
itup);
- /* Remember additional TIDs */
- for (int i = 1; i < BTreeTupleGetNPosting(itup); i++)
+
+ /* Remember all prior TIDs (must be at least one) */
+ for (int i = nitems - 2; i >= 0; i--)
{
itemIndex--;
_bt_savepostingitem(so, itemIndex, offnum,
```

I wonder if the comment “must be at lease one” should apply to the assignment of tupleOffset? The “for” loop starts from nitems-2, will it still must be at lease one item?

3
```
/*
- * Don't bother advancing the outermost loop's int iterator to
- * avoid processing killed items that relate to the same
- * offnum/posting list tuple. This micro-optimization hardly
- * seems worth it. (Further iterations of the outermost loop
- * will fail to match on this same posting list's first heap
- * TID instead, so we'll advance to the next offnum/index
- * tuple pretty quickly.)
+ * Don't advance itemIndex for outermost loop, no matter how
+ * nextIndex was advanced. It's possible that items whose
+ * TIDs weren't matched in posting list can still be killed
+ * (there might be a later tuple whose TID is a match).
*/
if (j == nposting)
killtuple = true;
```

I really don’t get what "Don't bother advancing the outermost loop's int iterator” means? Here we only set killtuple to true, then if (killtuple && !ItemIdIsDead(iid)), it breaks the inner while loop, in that case, outer while loop "while ((itemIndex = bms_next_member(so->killedItems, itemIndex)) >= 0)” will advance itemIndex.

I know this is a legacy comment, if you can explain a little bit, that will be very appreciated.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ignat Remizov 2025-12-03 12:37:26 Re: [PATCH] Add enable_copy_program GUC to control COPY PROGRAM
Previous Message Hayato Kuroda (Fujitsu) 2025-12-03 12:27:06 RE: Using MyDatabaseId in SET_LOCKTAG_APPLY_TRANSACTION