Re: POC, WIP: OR-clause support for indexes

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-03-04 02:26:48
Message-ID: CACJufxE6eG3ATnTfO9ubs7CjyfgjpCQrSt85oU33EZj00PdcbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 29, 2024 at 4:59 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> On 28/2/2024 17:27, Alena Rybakina wrote:
> > Maybe like that:
> >
> > It also considers the way to generate a path using BitmapScan indexes,
> > converting the transformed expression into expressions separated by "OR"
> > operations, and if it turns out to be the best and finally selects the
> > best one.
> Thanks,
> I spent some time describing the feature with documentation.
> A condensed description of the GUC is in the runtime-config file.
> Feature description has spread between TransformOrExprToANY and
> generate_saop_pathlist routines.
> Also, I've made tiny changes in the code to look more smoothly.
> All modifications are integrated into the two new patches.
>
> Feel free to add, change or totally rewrite these changes.

diff --git a/src/backend/utils/misc/guc_tables.c
b/src/backend/utils/misc/guc_tables.c
index 93ded31ed9..7d3a1ca238 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -1026,6 +1026,17 @@ struct config_bool ConfigureNamesBool[] =
true,
NULL, NULL, NULL
},
+ {
+ {"enable_or_transformation", PGC_USERSET, QUERY_TUNING_OTHER,
+ gettext_noop("Transform a sequence of OR clauses to an IN expression."),
+ gettext_noop("The planner will replace clauses like 'x=c1 OR x=c2 .."
+ "to the clause 'x IN (c1,c2,...)'"),
+ GUC_EXPLAIN
+ },
+ &enable_or_transformation,
+ true,
+ NULL, NULL, NULL
+ },

I think it should be something like:
+ gettext_noop("Transform a sequence of OR expressions to an array
expression."),
+ gettext_noop("The planner will replace expression like 'x=c1 OR x=c2 "
+ "to the expression 'x = ANY( ARRAY[c1,c2])''"

+JumbleState *
+JumbleExpr(Expr *expr, uint64 *queryId)
+{
+ JumbleState *jstate = NULL;
+
+ Assert(queryId != NULL);
+
+ jstate = (JumbleState *) palloc(sizeof(JumbleState));
+
+ /* Set up workspace for query jumbling */
+ jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
+ jstate->jumble_len = 0;
+ jstate->clocations_buf_size = 32;
+ jstate->clocations = (LocationLen *)
+ palloc(jstate->clocations_buf_size * sizeof(LocationLen));
+ jstate->clocations_count = 0;
+ jstate->highest_extern_param_id = 0;
+
+ /* Compute query ID */
+ _jumbleNode(jstate, (Node *) expr);
+ *queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
+ jstate->jumble_len,
+ 0));
+
+ return jstate;
+}
queryId may not be a good variable name here?

comment `/* Compute query ID */`
seems not correct, here we are just hashing the expression?

+/*
+ * Dynahash match function to use in guc_hashtab
+ */
+static int
+orclause_match(const void *data1, const void *data2, Size keysize)
+{
+ OrClauseGroupKey *key1 = (OrClauseGroupKey *) data1;
+ OrClauseGroupKey *key2 = (OrClauseGroupKey *) data2;
+
+ Assert(sizeof(OrClauseGroupKey) == keysize);
+
+ if (key1->opno == key2->opno && key1->exprtype == key2->exprtype &&
+ equal(key1->expr, key2->expr))
+ return 0;
+
+ return 1;
+}
the above comments seem not correct?

<para>
Enables or disables the query planner's ability to lookup and
group multiple
similar OR expressions to ANY (<xref
linkend="functions-comparisons-any-some"/>) expressions.
The grouping technique of this transformation is based on the
similarity of variable sides.
It applies to equality expressions only. One side of such an expression
must be a constant clause, and the other must contain a variable clause.
The default is <literal>on</literal>.
Also, during BitmapScan paths generation it enables analysis of elements
of IN or ANY constant arrays to cover such clause with BitmapOr set of
partial index scans.
</para>
` It applies to equality expressions only.` seems not correct?
`select * from tenk1 where unique1 < 1 or unique1 < 2; ` can also do
the transformation.
`similarity of variable sides.` seems not correct,
should it be 'sameness of the variable sides`?

in [1], we can get:
expression IN (value [, ...])
is equivalent to
expression = value1
OR
expression = value2
OR

in [2], we can get:
SOME is a synonym for ANY. IN is equivalent to = ANY.

but still transforming OR to ANY is not intuitive.
a normal user may not know what is "transforming OR to ANY".
so maybe adding a simple example at
<varlistentry id="guc-enable-or-transformation"
xreflabel="enable_or_transformation">
would be great. which, I did at previous thread.

I also did some refactoring based on v18, attached.

[1] https://www.postgresql.org/docs/current/functions-comparisons.html#FUNCTIONS-COMPARISONS-IN-SCALAR
[2] https://www.postgresql.org/docs/current/functions-subquery.html#FUNCTIONS-SUBQUERY-ANY-SOME

Attachment Content-Type Size
v18-0001-Minor-miscellaneous-refactor-based-on-v18.no-cfbot application/octet-stream 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wenhui qiu 2024-03-04 02:33:00 Re: Support "Right Semi Join" plan shapes
Previous Message Peter Smith 2024-03-04 01:55:21 Re: Synchronizing slots from primary to standby