Re: Tid scan improvements

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Edmund Horner <ejrh00(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Tid scan improvements
Date: 2021-02-17 20:45:19
Message-ID: CAApHDvqfLgx4xaYcJg3wgnW-fe2B-qLUKt00rkPz7QMunQL6+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for having a look at this.

On Wed, 17 Feb 2021 at 11:05, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2021-02-04 23:51:39 +1300, David Rowley wrote:
> > and
> > bool (*scan_tid_range_nextslot) (TableScanDesc sscan,
> > ScanDirection direction,
> > TupleTableSlot *slot);
> >
> > I added an additional function in tableam.h that does not have a
> > corresponding API function:
> >
> > static inline TableScanDesc
> > table_tid_range_start(Relation rel, Snapshot snapshot,
> > ItemPointer mintid,
> > ItemPointer maxtid)
> >
> > This just calls the standard scan_begin then calls scan_set_tid_range
> > setting the specified mintid and maxtid.
>
> Hm. But that means we can't rescan?

It might not be perfect, but to rescan, we must call table_rescan()
then table_set_tidrange() before calling the
table_scan_getnextslot_tidrange() function.

> > +bool
> > +heap_getnextslot_tidrange(TableScanDesc sscan, ScanDirection direction,
> > + TupleTableSlot *slot)
> > +{
> > + HeapScanDesc scan = (HeapScanDesc) sscan;
> > + ItemPointer mintid = &sscan->rs_mintid;
> > + ItemPointer maxtid = &sscan->rs_maxtid;
> > +
> > + /* Note: no locking manipulations needed */
> > + for (;;)
> > + {
> > + if (sscan->rs_flags & SO_ALLOW_PAGEMODE)
> > + heapgettup_pagemode(scan, direction, sscan->rs_nkeys, sscan->rs_key);
> > + else
> > + heapgettup(scan, direction, sscan->rs_nkeys, sscan->rs_key);
> > +
> > + if (scan->rs_ctup.t_data == NULL)
> > + {
> > + ExecClearTuple(slot);
> > + return false;
> > + }
> > +
> > + /*
> > + * heap_set_tidrange will have used heap_setscanlimits to limit the
> > + * range of pages we scan to only ones that can contain the TID range
> > + * we're scanning for. Here we must filter out any tuples from these
> > + * pages that are outwith that range.
> > + */
> > + if (ItemPointerCompare(&scan->rs_ctup.t_self, mintid) < 0)
> > + {
> > + ExecClearTuple(slot);
> > +
> > + /*
> > + * When scanning backwards, the TIDs will be in descending order.
> > + * Future tuples in this direction will be lower still, so we can
> > + * just return false to indicate there will be no more tuples.
> > + */
> > + if (ScanDirectionIsBackward(direction))
> > + return false;
> > +
> > + continue;
> > + }
> > +
> > + /*
> > + * Likewise for the final page, we must filter out TIDs greater than
> > + * maxtid.
> > + */
> > + if (ItemPointerCompare(&scan->rs_ctup.t_self, maxtid) > 0)
> > + {
> > + ExecClearTuple(slot);
> > +
> > + /*
> > + * When scanning forward, the TIDs will be in ascending order.
> > + * Future tuples in this direction will be higher still, so we can
> > + * just return false to indicate there will be no more tuples.
> > + */
> > + if (ScanDirectionIsForward(direction))
> > + return false;
> > + continue;
> > + }
> > +
> > + break;
> > + }
> > +
> > + /*
> > + * if we get here it means we have a new current scan tuple, so point to
> > + * the proper return buffer and return the tuple.
> > + */
> > + pgstat_count_heap_getnext(scan->rs_base.rs_rd);
>
> I wonder if there's an argument for counting the misses above via
> pgstat_count_heap_fetch()? Probably not, right?

I'm a bit undecided about that. In theory, we're doing the heap
fetches of tuples on the target page which are outside of the range so
we should maybe count them. On the other hand, it might be a little
confusing for very observant users if they see the fetches going up
for the tuples we skip over in TID Range scans.

> > +#define IsCTIDVar(node) \
> > + ((node) != NULL && \
> > + IsA((node), Var) && \
> > + ((Var *) (node))->varattno == SelfItemPointerAttributeNumber && \
> > + ((Var *) (node))->varlevelsup == 0)
> > +
> > +typedef enum
> > +{
> > + TIDEXPR_UPPER_BOUND,
> > + TIDEXPR_LOWER_BOUND
> > +} TidExprType;
> > +
> > +/* Upper or lower range bound for scan */
> > +typedef struct TidOpExpr
> > +{
> > + TidExprType exprtype; /* type of op; lower or upper */
> > + ExprState *exprstate; /* ExprState for a TID-yielding subexpr */
> > + bool inclusive; /* whether op is inclusive */
> > +} TidOpExpr;
> > +
> > +/*
> > + * For the given 'expr', build and return an appropriate TidOpExpr taking into
> > + * account the expr's operator and operand order.
> > + */
> > +static TidOpExpr *
> > +MakeTidOpExpr(OpExpr *expr, TidRangeScanState *tidstate)
> > +{
> > + Node *arg1 = get_leftop((Expr *) expr);
> > + Node *arg2 = get_rightop((Expr *) expr);
> > + ExprState *exprstate = NULL;
> > + bool invert = false;
> > + TidOpExpr *tidopexpr;
> > +
> > + if (IsCTIDVar(arg1))
> > + exprstate = ExecInitExpr((Expr *) arg2, &tidstate->ss.ps);
> > + else if (IsCTIDVar(arg2))
> > + {
> > + exprstate = ExecInitExpr((Expr *) arg1, &tidstate->ss.ps);
> > + invert = true;
> > + }
> > + else
> > + elog(ERROR, "could not identify CTID variable");
> > +
> > + tidopexpr = (TidOpExpr *) palloc(sizeof(TidOpExpr));
> > + tidopexpr->inclusive = false; /* for now */
> > +
> > + switch (expr->opno)
> > + {
> > + case TIDLessEqOperator:
> > + tidopexpr->inclusive = true;
> > + /* fall through */
> > + case TIDLessOperator:
> > + tidopexpr->exprtype = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
> > + break;
> > + case TIDGreaterEqOperator:
> > + tidopexpr->inclusive = true;
> > + /* fall through */
> > + case TIDGreaterOperator:
> > + tidopexpr->exprtype = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
> > + break;
> > + default:
> > + elog(ERROR, "could not identify CTID operator");
> > + }
> > +
> > + tidopexpr->exprstate = exprstate;
> > +
> > + return tidopexpr;
> > +}
> > +
> > +/*
> > + * Extract the qual subexpressions that yield TIDs to search for,
> > + * and compile them into ExprStates if they're ordinary expressions.
> > + */
> > +static void
> > +TidExprListCreate(TidRangeScanState *tidrangestate)
> > +{
> > + TidRangeScan *node = (TidRangeScan *) tidrangestate->ss.ps.plan;
> > + List *tidexprs = NIL;
> > + ListCell *l;
> > +
> > + foreach(l, node->tidrangequals)
> > + {
> > + OpExpr *opexpr = lfirst(l);
> > + TidOpExpr *tidopexpr;
> > +
> > + if (!IsA(opexpr, OpExpr))
> > + elog(ERROR, "could not identify CTID expression");
> > +
> > + tidopexpr = MakeTidOpExpr(opexpr, tidrangestate);
> > + tidexprs = lappend(tidexprs, tidopexpr);
> > + }
> > +
> > + tidrangestate->trss_tidexprs = tidexprs;
> > +}
>
> Architecturally it feels like this is something that really belongs more
> into plan time?

Possibly. It would mean TidOpExpr would have to become a Node type.
TID Range scan is really just following the lead set by TID Scan here.

I'm not sure if it's worth the trouble making these Node types for the
small gains there'd be in the performance of having the planner make
them once for prepared queries rather than them having to be built
during each execution.

Do you think it is?

> > +/*
> > + * table_set_tidrange resets the minimum and maximum TID range to scan for a
> > + * TableScanDesc created by table_beginscan_tidrange.
> > + */
> > +static inline void
> > +table_set_tidrange(TableScanDesc sscan, ItemPointer mintid,
> > + ItemPointer maxtid)
> > +{
> > + /* Ensure table_beginscan_tidrange() was used. */
> > + Assert((sscan->rs_flags & SO_TYPE_TIDRANGESCAN) != 0);
> > +
> > + sscan->rs_rd->rd_tableam->scan_set_tidrange(sscan, mintid, maxtid);
> > +}
>
> How does this interact with rescans?

We must call table_rescan() before calling table_set_tidrange() again.
That perhaps could be documented better. I'm just unsure if that
should be documented in tableam.h or if it's a restriction that only
needs to exist in heapam.c

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-02-17 20:56:24 Re: new heapcheck contrib module
Previous Message Thomas Munro 2021-02-17 20:42:04 Re: Finding cause of test fails on the cfbot site