Quick Links

Re: Tid scan improvements

From:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To:	Edmund Horner <ejrh00(at)gmail(dot)com>
Cc:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Tid scan improvements
Date:	2018-11-06 03:40:55
Message-ID:	CAKJS1f9+etR1VGkqp=hkGWn3EofWJVFnjKaTpHZva2PKhALdng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 4 November 2018 at 17:20, Edmund Horner <ejrh00(at)gmail(dot)com> wrote:
> I have managed to split my changes into 4 patches:
>
> v3-0001-Add-selectivity-and-nullness-estimates-for-the-ItemP.patch
> v3-0002-Support-range-quals-in-Tid-Scan.patch
> v3-0003-Support-backward-scans-over-restricted-ranges-in-hea.patch
> v3-0004-Tid-Scan-results-are-ordered.patch

Hi,

I've been looking over 0001 to 0003. I ran out of steam before 0004.

I like the design of the new patch. From what I threw so far at the
selectivity estimation code, it seems pretty good. I also quite like
the design in nodeTidscan.c for range scans.

I didn't quite manage to wrap my head around the code that removes
redundant quals from the tidquals. For example, with:

postgres=# explain select * from t1 where ctid <= '(0,10)' and a = 0;
QUERY PLAN
--------------------------------------------------
Tid Scan on t1 (cost=0.00..3.19 rows=1 width=4)
TID Cond: (ctid <= '(0,10)'::tid)
Filter: (a = 0)
(3 rows)

and:

postgres=# explain select * from t1 where ctid <= '(0,10)' or a = 20
and ctid >= '(0,0)';
QUERY PLAN
------------------------------------------------------------------------------
Tid Scan on t1 (cost=0.01..176.18 rows=12 width=4)
TID Cond: ((ctid <= '(0,10)'::tid) OR (ctid >= '(0,0)'::tid))
Filter: ((ctid <= '(0,10)'::tid) OR ((a = 20) AND (ctid >= '(0,0)'::tid)))
(3 rows)

I understand why the 2nd query didn't remove the ctid quals from the
filter, and I understand why the first query could. I just didn't
manage to convince myself that the code behaves correctly for all
cases.

During my pass through 0001, 0002 and 0003 I noted the following:

0001:

1. I see a few instances of:

#define DatumGetItemPointer(X) ((ItemPointer) DatumGetPointer(X))
#define ItemPointerGetDatum(X) PointerGetDatum(X)

in both tid.c and ginfuncs.c, and I see you have:

+ itemptr = (ItemPointer) DatumGetPointer(constval);

Do you think it would be worth moving the macros out of tid.c and
ginfuncs.c into postgres.h and use that macro instead?

(I see the code in this file already did this, so it might not matter
about this)

0002:

2. In TidCompoundRangeQualFromExpr() rlst is not really needed. You
can just return MakeTidRangeQuals(found_quals); or return NIL.

3. Can you explain why this only needs to take place when list_length() == 1?

/*
* In the case of a compound qual such as "ctid > ? AND ctid < ? AND ...",
* the various parts will have come from different RestrictInfos. So
* remove each part separately.
*/
if (list_length(tidquals) == 1)
{
Node *qual = linitial(tidquals);

if (and_clause(qual))
{
BoolExpr *and_qual = ((BoolExpr *) qual);

scan_clauses = list_difference(scan_clauses, and_qual->args);
}
}

4. Accidental change?

- tidquals);
+ tidquals
+ );

5. Shouldn't this comment get changed?

- * NumTids number of tids in this scan
+ * NumRanges number of tids in this scan

6. There's no longer a field named NumTids

- * TidList evaluated item pointers (array of size NumTids)
+ * TidRanges evaluated item pointers (array of size NumTids)

7. The following field is not documented in TidScanState:

+ bool tss_inScan;

8. Can you name this exprtype instead?

+ TidExprType type; /* type of op */

"type" is used by Node types to indicate their type.

9. It would be neater this:

if (expr->opno == TIDLessOperator || expr->opno == TIDLessEqOperator)
tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
else if (expr->opno == TIDGreaterOperator || expr->opno == TIDGreaterEqOperator)
tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
else
tidopexpr->type = TIDEXPR_EQ;

tidopexpr->exprstate = exprstate;

tidopexpr->inclusive = expr->opno == TIDLessEqOperator || expr->opno
== TIDGreaterEqOperator;

as a switch:

switch (expr->opno)
{
case TIDLessEqOperator:
tidopexpr->inclusive = true;
/* fall through */
case TIDLessOperator:
tidopexpr->type = invert ? TIDEXPR_LOWER_BOUND : TIDEXPR_UPPER_BOUND;
break;
case TIDGreaterEqOperator:
tidopexpr->inclusive = true;
/* fall through */
case TIDGreaterOperator:
tidopexpr->type = invert ? TIDEXPR_UPPER_BOUND : TIDEXPR_LOWER_BOUND;
break;
default:
tidopexpr->type = TIDEXPR_EQ;
}
tidopexpr->exprstate = exprstate;

10. I don't quite understand this comment:

+ * Create an ExprState corresponding to the value part of a TID comparison,
+ * and wrap it in a TidOpExpr. Set the type and inclusivity of the TidOpExpr
+ * appropriately, depending on the operator and position of the its arguments.

I don't quite see how the code sets the inclusivity depending on the
position of the arguments.

Maybe the comment should be:

+ * For the given 'expr' build and return an appropriate TidOpExpr taking into
+ * account the expr's operator and operand order.

11. ScalarArrayOpExpr are commonly named "saop":

+static TidOpExpr *
+MakeTidScalarArrayOpExpr(ScalarArrayOpExpr *saex, TidScanState *tidstate)

(Though I see it's saex in other places in that file, so might not matter...)

12. You need to code SetTidLowerBound() with similar wraparound logic
you have in SetTidUpperBound().

It's perhaps unlikely, but the following shows incorrect results.

postgres=# select ctid from t1 where ctid > '(0,65535)' limit 1;
ctid
-------
(0,1)
(1 row)

-- the following is fine.

Time: 1.652 ms
postgres=# select ctid from t1 where ctid >= '(0,65535)' limit 1;
ctid
-------
(1,1)
(1 row)

Likely you can just upgrade to the next block when the offset is >
MaxOffsetNumber.

13. It looks like the previous code didn't make the assumption you're making in:

+ * A current-of TidExpr only exists by itself, and we should
+ * already have allocated a tidList entry for it. We don't
+ * need to check whether the tidList array needs to be
+ * resized.

I'm not sure if it's a good idea to lock the executor code into what
the grammar currently says is possible. The previous code didn't
assume that.

14. we pass 'false' to what?

+ * save the tuple and the buffer returned to us by the access methods in
+ * our scan tuple slot and return the slot. Note: we pass 'false' because
+ * tuples returned by heap_getnext() are pointers onto disk pages and were
+ * not created with palloc() and so should not be pfree()'d. Note also
+ * that ExecStoreHeapTuple will increment the refcount of the buffer; the
+ * refcount will not be dropped until the tuple table slot is cleared.
*/
- return ExecClearTuple(slot);
+ if (tuple)
+ ExecStoreBufferHeapTuple(tuple, /* tuple to store */
+ slot, /* slot to store in */
+ scandesc->rs_cbuf); /* buffer associated
+ * with this tuple */
+ else
+ ExecClearTuple(slot);
+
+ return slot;

0003:

Saw nothing wrong:

0004:

Not yet reviewed.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Tid scan improvements at 2018-11-04 04:20:32 from Edmund Horner

Responses

Re: Tid scan improvements at 2018-11-06 03:52:49 from Alvaro Herrera
Re: Tid scan improvements at 2018-11-09 02:01:33 from Edmund Horner

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrey Lepikhov	2018-11-06 03:46:17	Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)
Previous Message	Michael Paquier	2018-11-06 03:08:34	Re: settings to control SSL/TLS protocol version