| From: | "a(dot)rybakina" <a(dot)rybakina(at)postgrespro(dot)ru> | 
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org | 
| Cc: | "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Peter Eisentraut <peter(at)eisentraut(dot)org> | 
| Subject: | Re: POC, WIP: OR-clause support for indexes | 
| Date: | 2023-10-04 19:19:59 | 
| Message-ID: | 668892c1-fb11-3a79-ce5e-1c194b7b3263@postgrespro.ru | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 29.09.2023 20:35, a.rybakina wrote:
>>
>> I'm sorry I didn't write for a long time, but I really had a very 
>> difficult month, now I'm fully back to work.
>>
>> *I was able to implement the patches to the end and moved the 
>> transformation of "OR" expressions to ANY.* I haven't seen a big 
>> difference between them yet, one has a transformation before 
>> calculating selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the 
>> other after (v7.2-Replace-OR-clause-to-ANY.patch). Regression tests 
>> are passing, I don't see any problems with selectivity, nothing has 
>> fallen into the coredump, but I found some incorrect transformations. 
>> What is the reason for these inaccuracies, I have not found, but, to 
>> be honest, they look unusual). Gave the error below.
>>
>> In the patch, I don't like that I had to drag three libraries from 
>> parsing until I found a way around it.The advantage of this approach 
>> compared to the other ([1]) is that at this stage all possible or 
>> transformations are performed, compared to the patch, where the 
>> transformation was done at the parsing stage. That is, here, for 
>> example, there are such optimizations in the transformation:
>>
>>
>> I took the common element out of the bracket and the rest is 
>> converted to ANY, while, as noted by Peter Geoghegan, we did not have 
>> several bitmapscans, but only one scan through the array.
>>
>> postgres=# explain analyze SELECT p1.oid, p1.proname
>> FROM pg_proc as p1
>> WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR 
>> prolang = 13 AND prolang = 3;
>>                                               QUERY PLAN
>> -------------------------------------------------------------------------------------------------------
>>  Seq Scan on pg_proc p1  (cost=0.00..151.66 rows=1 width=68) (actual 
>> time=1.167..1.168 rows=0 loops=1)
>>    Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid, 
>> '2'::oid, '3'::oid])))
>>    Rows Removed by Filter: 3302
>>  Planning Time: 0.146 ms
>>  Execution Time: 1.191 ms
>> (5 rows)
>> *Falls into coredump at me:*
>> explain analyze SELECT p1.oid, p1.proname
>> FROM pg_proc as p1
>> WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;
>>
>
Hi, all!
I fixed the kernel dump issue and all the regression tests were 
successful, but I discovered another problem when I added my own 
regression tests.
Some queries that contain "or" expressions do not convert to "ANY". I 
have described this in more detail using diff as expected and real results:
diff -U3 
/home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out 
/home/alena/postgrespro__copy6/src/test/regress/results/create_index.out
--- 
/home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out 
2023-10-04 21:54:12.496282667 +0300
+++ 
/home/alena/postgrespro__copy6/src/test/regress/results/create_index.out 
2023-10-04 21:55:41.665422459 +0300
@@ -1925,17 +1925,20 @@
  EXPLAIN (COSTS OFF)
  SELECT count(*) FROM tenk1
    WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
-                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------
+                                                        QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------
   Aggregate
     ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: (((thousand = 42) AND (tenthous = ANY 
('{1,3}'::integer[]))) OR (thousand = 41))
+         Recheck Cond: ((((thousand = 42) AND (tenthous = 1)) OR 
((thousand = 42) AND (tenthous = 3))) OR (thousand = 41))
           ->  BitmapOr
-               ->  Bitmap Index Scan on tenk1_thous_tenthous
-                     Index Cond: ((thousand = 42) AND (tenthous = ANY 
('{1,3}'::integer[])))
+               ->  BitmapOr
+                     ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Index Cond: ((thousand = 42) AND (tenthous = 1))
+                     ->  Bitmap Index Scan on tenk1_thous_tenthous
+                           Index Cond: ((thousand = 42) AND (tenthous = 3))
                 ->  Bitmap Index Scan on tenk1_thous_tenthous
                       Index Cond: (thousand = 41)
-(8 rows)
+(11 rows)
@@ -1946,24 +1949,50 @@
  EXPLAIN (COSTS OFF)
  SELECT count(*) FROM tenk1
+  WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+                                            QUERY PLAN
+---------------------------------------------------------------------------------------------------
+ Aggregate
+   ->  Bitmap Heap Scan on tenk1
+         Recheck Cond: ((thousand = 42) OR ((thousand = 42) AND 
(tenthous = 1)) OR (tenthous = 1))
+         ->  BitmapOr
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: (thousand = 42)
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: ((thousand = 42) AND (tenthous = 1))
+               ->  Bitmap Index Scan on tenk1_thous_tenthous
+                     Index Cond: (tenthous = 1)
+(10 rows)
+
+SELECT count(*) FROM tenk1
+  WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+ count
+-------
+    11
+(1 row)
+
+EXPLAIN (COSTS OFF)
+SELECT count(*) FROM tenk1
    WHERE hundred = 42 AND (thousand = 42 OR thousand = 99 OR tenthous < 
2) OR thousand = 41;
-                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
+                                                       QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------
   Aggregate
     ->  Bitmap Heap Scan on tenk1
-         Recheck Cond: (((hundred = 42) AND ((tenthous < 2) OR 
(thousand = ANY ('{42,99}'::integer[])))) OR (thousand = 41))
+         Recheck Cond: (((hundred = 42) AND ((thousand = 42) OR 
(thousand = 99) OR (tenthous < 2))) OR (thousand = 41))
           ->  BitmapOr
                 ->  BitmapAnd
                       ->  Bitmap Index Scan on tenk1_hundred
                             Index Cond: (hundred = 42)
                       ->  BitmapOr
                             ->  Bitmap Index Scan on tenk1_thous_tenthous
-                                 Index Cond: (tenthous < 2)
+                                 Index Cond: (thousand = 42)
                             ->  Bitmap Index Scan on tenk1_thous_tenthous
-                                 Index Cond: (thousand = ANY 
('{42,99}'::integer[]))
+                                 Index Cond: (thousand = 99)
+                           ->  Bitmap Index Scan on tenk1_thous_tenthous
+                                 Index Cond: (tenthous < 2)
                 ->  Bitmap Index Scan on tenk1_thous_tenthous
                       Index Cond: (thousand = 41)
-(14 rows)
+(16 rows)
diff -U3 
/home/alena/postgrespro__copy6/src/test/regress/expected/join.out 
/home/alena/postgrespro__copy6/src/test/regress/results/join.out
--- /home/alena/postgrespro__copy6/src/test/regress/expected/join.out 
2023-10-04 21:53:55.632069079 +0300
+++ /home/alena/postgrespro__copy6/src/test/regress/results/join.out 
2023-10-04 21:55:46.597485979 +0300
  explain (costs off)
  select * from tenk1 a join tenk1 b on
    (a.unique1 < 20 or a.unique1 = 3 or a.unique1 = 1 and b.unique1 = 2) or
    ((a.unique2 = 3 or a.unique2 = 7) and b.hundred = 4);
- QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
   Nested Loop
-   Join Filter: ((a.unique1 < 20) OR ((a.unique1 = 1) AND (b.unique1 = 
2)) OR ((a.unique2 = ANY ('{3,7}'::integer[])) AND (b.hundred = 4)) OR 
(a.unique1 = 3))
+   Join Filter: ((a.unique1 < 20) OR (a.unique1 = 3) OR ((a.unique1 = 
1) AND (b.unique1 = 2)) OR (((a.unique2 = 3) OR (a.unique2 = 7)) AND 
(b.hundred = 4)))
     ->  Seq Scan on tenk1 b
     ->  Materialize
           ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: ((unique1 < 20) OR (unique1 = 1) OR 
(unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 3))
+               Recheck Cond: ((unique1 < 20) OR (unique1 = 3) OR 
(unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                 ->  BitmapOr
                       ->  Bitmap Index Scan on tenk1_unique1
                             Index Cond: (unique1 < 20)
                       ->  Bitmap Index Scan on tenk1_unique1
+                           Index Cond: (unique1 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique1
                             Index Cond: (unique1 = 1)
                       ->  Bitmap Index Scan on tenk1_unique2
-                           Index Cond: (unique2 = ANY ('{3,7}'::integer[]))
-                     ->  Bitmap Index Scan on tenk1_unique1
-                           Index Cond: (unique1 = 3)
-(15 rows)
+                           Index Cond: (unique2 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique2
+                           Index Cond: (unique2 = 7)
+(17 rows)
  explain (costs off)
  select * from tenk1 a join tenk1 b on
    (a.unique1 < 20 or a.unique1 = 3 or a.unique1 = 1 and b.unique1 = 2) or
    ((a.unique2 = 3 or a.unique2 = 7) and b.hundred = 4);
- QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------
   Nested Loop
-   Join Filter: ((a.unique1 < 20) OR ((a.unique1 = 1) AND (b.unique1 = 
2)) OR ((a.unique2 = ANY ('{3,7}'::integer[])) AND (b.hundred = 4)) OR 
(a.unique1 = 3))
+   Join Filter: ((a.unique1 < 20) OR (a.unique1 = 3) OR ((a.unique1 = 
1) AND (b.unique1 = 2)) OR (((a.unique2 = 3) OR (a.unique2 = 7)) AND 
(b.hundred = 4)))
     ->  Seq Scan on tenk1 b
     ->  Materialize
           ->  Bitmap Heap Scan on tenk1 a
-               Recheck Cond: ((unique1 < 20) OR (unique1 = 1) OR 
(unique2 = ANY ('{3,7}'::integer[])) OR (unique1 = 3))
+               Recheck Cond: ((unique1 < 20) OR (unique1 = 3) OR 
(unique1 = 1) OR (unique2 = 3) OR (unique2 = 7))
                 ->  BitmapOr
                       ->  Bitmap Index Scan on tenk1_unique1
                             Index Cond: (unique1 < 20)
                       ->  Bitmap Index Scan on tenk1_unique1
+                           Index Cond: (unique1 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique1
                             Index Cond: (unique1 = 1)
                       ->  Bitmap Index Scan on tenk1_unique2
-                           Index Cond: (unique2 = ANY ('{3,7}'::integer[]))
-                     ->  Bitmap Index Scan on tenk1_unique1
-                           Index Cond: (unique1 = 3)
-(15 rows)
+                           Index Cond: (unique2 = 3)
+                     ->  Bitmap Index Scan on tenk1_unique2
+                           Index Cond: (unique2 = 7)
+(17 rows)
I haven't been able to fully deal with this problem yet
I have attached my experimental patch with the code.
| Attachment | Content-Type | Size | 
|---|---|---|
| 0001-Replace-OR-clause-to-ANY-expressions.diff | text/x-patch | 36.7 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2023-10-04 19:33:27 | Re: trying again to get incremental backup | 
| Previous Message | Robert Haas | 2023-10-04 19:10:05 | Re: Request for comment on setting binary format output per session |