Re: heapgettup refactoring

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: heapgettup refactoring
Date: 2022-11-04 15:51:03
Message-ID: CAAKRu_ZJg_N7zHtWP+JoSY_hrce4+GKioL137Y2c2En-kuXQ7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review!
Attached is v2 with feedback addressed.

On Mon, Oct 31, 2022 at 9:09 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > From 9d8b01960463dc64ff5b111d523ff80fce3017af Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 31 Oct 2022 13:40:06 -0400
> > Subject: [PATCH v1 2/3] Turn HeapKeyTest macro into function
> >
> > This should always be inlined appropriately now. It is easier to read as
> > a function. Also, remove unused include in catcache.c.
> > ---
> > src/backend/access/heap/heapam.c | 10 ++--
> > src/backend/utils/cache/catcache.c | 1 -
> > src/include/access/valid.h | 76 ++++++++++++------------------
> > 3 files changed, 36 insertions(+), 51 deletions(-)
> >
> > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
> > index 12be87efed..1c995faa12 100644
> > --- a/src/backend/access/heap/heapam.c
> > +++ b/src/backend/access/heap/heapam.c
> > @@ -716,8 +716,10 @@ heapgettup(HeapScanDesc scan,
> > snapshot);
> >
> > if (valid && key != NULL)
> > - HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
> > - nkeys, key, valid);
> > + {
> > + valid = HeapKeyTest(tuple, RelationGetDescr(scan->rs_base.rs_rd),
> > + nkeys, key);
> > + }
> >
> > if (valid)
> > {
>
> superfluous parens.

fixed.

> > From a894ce38c488df6546392b9f3bd894b67edf951e Mon Sep 17 00:00:00 2001
> > From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
> > Date: Mon, 31 Oct 2022 13:40:29 -0400
> > Subject: [PATCH v1 3/3] Refactor heapgettup* and heapgetpage
> >
> > Simplify heapgettup(), heapgettup_pagemode(), and heapgetpage(). All
> > three contained several unnecessary local variables, duplicate code, and
> > nested if statements. Streamlining these improves readability and
> > extensibility.
>
> It'd be nice to break this into slightly smaller chunks.

I can do that. Since incorporating feedback will be harder once I break
it up into smaller chunks, I'm inclined to wait to do so until I know
that the structure I have now is the one we will go with. (I know smaller
chunks will make it more reviewable.)

> > +
> > +static inline void heapgettup_no_movement(HeapScanDesc scan)
> > +{
>
> FWIW, for function definitions we keep the return type (and with that also the
> the "static inline") on a separate line.

Fixed

>
> > + ItemId lpp;
> > + OffsetNumber lineoff;
> > + BlockNumber page;
> > + Page dp;
> > + HeapTuple tuple = &(scan->rs_ctup);
> > + /*
> > + * ``no movement'' scan direction: refetch prior tuple
> > + */
> > +
> > + /* Since the tuple was previously fetched, needn't lock page here */
> > + if (!scan->rs_inited)
> > + {
> > + Assert(!BufferIsValid(scan->rs_cbuf));
> > + tuple->t_data = NULL;
> > + return;
>
> Is it possible to have a no-movement scan with an uninitialized scan? That
> doesn't really seem to make sense. At least that's how I understand the
> explanation for NoMovement nearby:
> * dir == NoMovementScanDirection means "re-fetch the tuple indicated
> * by scan->rs_ctup".
>
> We can't have a rs_ctup without an already started scan, no?
>
> Looks like this is pre-existing code that you just moved, but it still seems
> wrong.

Changed to an assert

>
> > + }
> > + page = ItemPointerGetBlockNumber(&(tuple->t_self));
> > + if (page != scan->rs_cblock)
> > + heapgetpage((TableScanDesc) scan, page);
>
>
> We have a
> BlockNumber page;
> and
> Page dp;
> in this code which seems almost intentionally confusing. This again is a
> pre-existing issue but perhaps we could clean it up first?

in attached
page -> block
dp -> page
in basically all locations in heapam.c (should that be its own commit?)

> > +static inline Page heapgettup_continue_page(HeapScanDesc scan, BlockNumber page, ScanDirection dir,
> > + int *linesleft, OffsetNumber *lineoff)
> > +{
> > + HeapTuple tuple = &(scan->rs_ctup);
>
> Hm. Finding the next offset via rs_ctup doesn't seem quite right. For one,
> it's not actually that cheap to extract the offset from an ItemPointer because
> of the the way we pack it into ItemPointerData.

So, it was like this before [1].
What about saving the lineoff in rs_cindex.

It is smaller, but that seems okay, right?

> > + Page dp = BufferGetPage(scan->rs_cbuf);
> > + TestForOldSnapshot(scan->rs_base.rs_snapshot, scan->rs_base.rs_rd, dp);
>
> Newlines between definitions and code :)

k

> Perhaps worth asserting that the scan is initialized and that rs_cbuf is valid?

indeed.

>
> > + if (ScanDirectionIsForward(dir))
> > + {
> > + *lineoff = OffsetNumberNext(ItemPointerGetOffsetNumber(&(tuple->t_self)));
> > + *linesleft = PageGetMaxOffsetNumber(dp) - (*lineoff) + 1;
>
> We can't access PageGetMaxOffsetNumber etc without holding a lock on the
> page. It's not immediately obvious that that is held in all paths.

In heapgettup() I call LockBuffer() before invoking
heapgettup_continue_page() and heapgettup_start_page() which are the
ones doing this.

I did have big plans for using the continue_page and start_page
functions in heapgettup_pagemode() as well, but since I'm not doing that
now, I can add in an expectation that the lock is held.

I added a comment saying the caller is responsible for acquiring the
lock if needed. I thought of adding an assert, but I don't see that
being done outside of bufmgr.c

BufferDesc *bufHdr = GetBufferDescriptor(buffer - 1);
Assert(LWLockHeldByMe(BufferDescriptorGetContentLock(bufHdr)));

> > +static inline BlockNumber heapgettup_initial_page(HeapScanDesc scan, ScanDirection dir)
> > +{
> > + Assert(!ScanDirectionIsNoMovement(dir));
> > + Assert(!scan->rs_inited);
>
> Is there a reason we couldn't set rs_inited in here, rather than reapeating
> that in all callers?

I wasn't sure if future callers or existing callers in the future may
need to do steps other than what is in heapgettup_initial_page() before
setting rs_inited. I thought of the responsibility of
heapgettup_initial_page() as returning the initial page to start the
scan. If it is going to do all initialization steps, perhaps the name
should change? I thought having a function that says it does
initialization of the scan might be confusing since initscan() also
exists.

> ISTM that this function should deal with the
> /*
> * return null immediately if relation is empty
> */
>
> logic, I think you now are repeating that check on every call to heapgettup().

So, that's a good point. If I move setting rs_inited inside of
heapgettup_initial_page(), then I can also easily move the empty table
check inside there too.

I don't want to set rs_inited before every return in
heapgettup_initial_page(). Do you think there are any issues with
setting it at the top of the function?

I thought about setting it at the very top (even before checking if the
relation is empty) Is it okay to set it before the empty table check?
rs_inited will be set to false at the bottom before returning. But,
maybe this will be an issue in other callers of
heapgettup_initial_page()?

Anyway, I have changed it in attached v2.

> > @@ -511,182 +711,55 @@ heapgettup(HeapScanDesc scan,
> > ScanKey key)
> > {
> > HeapTuple tuple = &(scan->rs_ctup);
> > - Snapshot snapshot = scan->rs_base.rs_snapshot;
> > - bool backward = ScanDirectionIsBackward(dir);
> > BlockNumber page;
> > - bool finished;
> > Page dp;
> > - int lines;
> > OffsetNumber lineoff;
> > int linesleft;
> > - ItemId lpp;
> > +
> > + if (ScanDirectionIsNoMovement(dir))
> > + return heapgettup_no_movement(scan);
>
> Maybe add an unlikely() - this path is barely ever used...

done.

> > + page = scan->rs_cblock;
> > + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
> > + dp = heapgettup_continue_page(scan, page, dir, &linesleft, &lineoff);
> > + goto continue_page;
> > }
> >
> > /*
> > * advance the scan until we find a qualifying tuple or run out of stuff
> > * to scan
> > */
> > - lpp = PageGetItemId(dp, lineoff);
> > - for (;;)
> > + while (page != InvalidBlockNumber)
> > {
> > + heapgetpage((TableScanDesc) scan, page);
> > + LockBuffer(scan->rs_cbuf, BUFFER_LOCK_SHARE);
> > + dp = heapgettup_start_page(scan, page, dir, &linesleft, &lineoff);
> > + continue_page:
>
>
> I don't like the goto continue_page at all. Seems that the paths leading here
> should call LockBuffer(), heapgettup_start_page() etc? Possibly a do {} while
> () loop could do the trick as well.

I don't see how a do while loop would solve help with the problem.
We need to check if the block number is valid after getting a block
assignment before doing heapgetpage() (e.g. after
heapgettup_initial_page() and after heapgettup_advance_page()).

Removing the goto continue_page means adding the heapgettpage(),
heapgettup_start_page(), etc code block in two places now (both after
heapgettup_initial_page() and after heapgettup_advance_page()) and, in
both locations we have to add an if statement to check if the block is
valid. I feel like this makes the function longer and harder to
understand. Keeping the loop as short as possible makes it clear what it
is doing. I think that with an appropriate warning comment, the goto
continue_page is clearer and easier to understand. To me, starting a
page at the top of the outer loop, then looping through the lines in the
page and is the structure that makes the most sense.

- Melanie

[1] https://github.com/postgres/postgres/blob/master/src/backend/access/heap/heapam.c#L572

Attachment Content-Type Size
v2-0001-Remove-breaks-in-HeapTupleSatisfiesVisibility.patch application/octet-stream 1.4 KB
v2-0002-Turn-HeapKeyTest-macro-into-function.patch application/octet-stream 3.7 KB
v2-0003-Refactor-heapgettup-and-heapgetpage.patch application/octet-stream 32.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2022-11-04 15:58:34 Re: Add non-blocking version of PQcancel
Previous Message Peter Eisentraut 2022-11-04 15:45:35 Re: psql: Add command to use extended query protocol