Re: POC, WIP: OR-clause support for indexes

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Peter Geoghegan <pg(at)bowt(dot)ie>, "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2024-02-10 05:00:00
Message-ID: CACJufxGfWnv9Q+sJ9iOR4w8si0cEa8wuOw=-+qRjrq4ZKdEhLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 8, 2024 at 1:34 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>
> On 3/2/2024 02:06, Alena Rybakina wrote:
> > On 01.02.2024 08:00, jian he wrote:
> > I added your code to the patch.
> Thanks Alena and Jian for the detailed scrutiny!
>
> A couple of questions:
> 1. As I see, transformAExprIn uses the same logic as we invented but
> allows composite and domain types. Could you add a comment explaining
> why we forbid row types in general, in contrast to the transformAExprIn
> routine?
> 2. Could you provide the tests to check issues covered by the recent (in
> v.15) changes?
>
> Patch 0001-* in the attachment incorporates changes induced by Jian's
> notes from [1].
> Patch 0002-* contains a transformation of the SAOP clause, which allows
> the optimizer to utilize partial indexes if they cover all values in
> this array. Also, it is an answer to Alexander's note [2] on performance
> degradation. This first version may be a bit raw, but I need your
> opinion: Does it resolve the issue?
yes. It resolved the partial index performance degradation issue.
The v16, 0002 extra code overhead is limited.

Here is how I test it.
drop table if exists test;
create table test as (select (random()*100)::int x, (random()*1000) y
from generate_series(1,1000000) i);
create index test_x_1_y on test (y) where x = 1;
create index test_x_2_y on test (y) where x = 2;
create index test_x_3_y on test (y) where x = 3;
create index test_x_4_y on test (y) where x = 4;
create index test_x_5_y on test (y) where x = 5;
create index test_x_6_y on test (y) where x = 6;
create index test_x_7_y on test (y) where x = 7;
create index test_x_8_y on test (y) where x = 8;
create index test_x_9_y on test (y) where x = 9;
create index test_x_10_y on test (y) where x = 10;

set enable_or_transformation to on;
explain(analyze, costs off)
select * from test
where (x = 1 or x = 2 or x = 3 or x = 4 or x = 5 or x = 6 or x = 7 or
x = 8 or x = 9 or x = 10);

set enable_or_transformation to off;
explain(analyze, costs off)
select * from test
where (x = 1 or x = 2 or x = 3 or x = 4 or x = 5 or x = 6 or x = 7 or
x = 8 or x = 9 or x = 10);

FAILED: src/backend/postgres_lib.a.p/optimizer_path_indxpath.c.o
ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include
-I../../Desktop/pg_src/src8/postgres/src/include
-I/usr/include/libxml2 -fdiagnostics-color=always --coverage
-D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -O0 -g
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE
-Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wimplicit-fallthrough=3
-Wcast-function-type -Wshadow=compatible-local -Wformat-security
-Wdeclaration-after-statement -Wno-format-truncation
-Wno-stringop-truncation -Wunused-variable -Wuninitialized
-Werror=maybe-uninitialized -Wreturn-type
-DWRITE_READ_PARSE_PLAN_TREES -DCOPY_PARSE_PLAN_TREES
-DREALLOCATE_BITMAPSETS -DRAW_EXPRESSION_COVERAGE_TEST
-fno-omit-frame-pointer -fPIC -pthread -DBUILDING_DLL -MD -MQ
src/backend/postgres_lib.a.p/optimizer_path_indxpath.c.o -MF
src/backend/postgres_lib.a.p/optimizer_path_indxpath.c.o.d -o
src/backend/postgres_lib.a.p/optimizer_path_indxpath.c.o -c
../../Desktop/pg_src/src8/postgres/src/backend/optimizer/path/indxpath.c
../../Desktop/pg_src/src8/postgres/src/backend/optimizer/path/indxpath.c:
In function ‘build_paths_for_SAOP’:
../../Desktop/pg_src/src8/postgres/src/backend/optimizer/path/indxpath.c:1267:33:
error: declaration of ‘pd’ shadows a previous local
[-Werror=shadow=compatible-local]
1267 | PredicatesData *pd = (PredicatesData *) lfirst(lc);
| ^~
../../Desktop/pg_src/src8/postgres/src/backend/optimizer/path/indxpath.c:1235:29:
note: shadowed declaration is here
1235 | PredicatesData *pd;
| ^~
cc1: all warnings being treated as errors
[32/126] Compiling C object src/backend/postgres_lib.a.p/utils_adt_ruleutils.c.o
ninja: build stopped: subcommand failed.

+ if (!predicate_implied_by(index->indpred, list_make1(rinfo1), true))
+ elog(ERROR, "Logical mistake in OR <-> ANY transformation code");
the error message seems not clear?
What is a "Logical mistake"?

static List *
build_paths_for_SAOP(PlannerInfo *root, RelOptInfo *rel, RestrictInfo *rinfo,
List *other_clauses)
I am not sure what's `other_clauses`, and `rinfo` refers to? adding
some comments would be great.

struct PredicatesData needs some comments, I think.

+bool
+saop_covered_by_predicates(ScalarArrayOpExpr *saop, List *predicate_lists)
+{
+ ListCell *lc;
+ PredIterInfoData clause_info;
+ bool result = false;
+ bool isConstArray;
+
+ Assert(IsA(saop, ScalarArrayOpExpr));
is this Assert necessary?

For the function build_paths_for_SAOP, I think I understand the first
part of the code.
But I am not 100% sure of the second part of the `foreach(lc,
predicate_lists)` code.
more comments in `foreach(lc, predicate_lists)` would be helpful.

do you need to add `PredicatesData` to src/tools/pgindent/typedefs.list?

I also did some minor refactoring of generate_saop_pathlist.

type_is_rowtype does not check if the type is array type.
transformBoolExprOr the OR QUAL, the Const part cannot be an array.
simple example:
alter table tenk1 add column arr int[];
set enable_or_transformation to on;
EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1
WHERE arr = '{1,2,3}' or arr = '{1,2}';

instead of let it go to `foreach (lc, entries)`,
we can reject the Const array at `foreach(lc, expr->args)`

also `foreach(lc, expr->args)` do we need to reject cases like
`contain_subplans((Node *) nconst_expr)`?
maybe let the nconst_expr be a Var node would be far more easier.

Attachment Content-Type Size
v1-0001-minor-refactor-generate_saop_pathlist.no-cfbot application/octet-stream 1.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2024-02-10 07:40:43 backend *.c #include cleanup (IWYU)
Previous Message Andres Freund 2024-02-10 04:33:23 Re: Popcount optimization using AVX512