Re: Trying to pull up EXPR SubLinks

From: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Trying to pull up EXPR SubLinks
Date: 2020-04-24 03:25:14
Message-ID: CAKU4AWqLB5FbsvZG1uQ8cSMXF-qcHmO9zQHHFw35J6CaqYAGrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Actually I have a different opinion to handle this issue, to execute the
a > (select avg(a) from tinner where x = touer.x); The drawback of current
path is because it may calculates the same touer.x value multi-times. So
if we cache the values we have calculated before, we can avoid the cost.
Material path may be the one we can reference but it assumes all the tuples
in the tuplestore matches the input params, which is not the fact here.

But what if the input params doesn't change? If so we can use Material path
to optimize this case. But since we don't know if the if the input params
changed
or not during plan time, we just add the path (let's assume we can add it
with some
rules or cost calculation). If the input params is not changed, we use the
cached
values, if the input params changed, we can ReScan the Material node. To
optimize
the the cache invalidation frequent issue like (1, 2, 1, 2, 1, 2) case, we
may consider
a sort path to change the input values to (1, 1, 1, 2, 2, 2). But overall
it is a big effort.

As a independent small optimization maybe if the input params doesn't
change, we
can use the tuples in the Material node again. Suppose it will not
demage our current
framework if we can add the material path by either rules based or cost
based.

Suppose we have the following data:

demo=# select * from j1 limit 10;
i | im5 | im100 | im1000
----+-----+-------+--------
1 | 1 | 1 | 1
2 | 2 | 2 | 2
3 | 3 | 3 | 3
4 | 4 | 4 | 4
5 | 0 | 5 | 5
6 | 1 | 6 | 6
7 | 2 | 7 | 7
8 | 3 | 8 | 8
9 | 4 | 9 | 9
10 | 0 | 10 | 10
(10 rows)

totally we have j1 = 10,000,002 rows, the extra 2 rows because we have 3
rows for i=1
demo=# select * from j1 where i = 1;
i | im5 | im100 | im1000
---+-----+-------+--------
1 | 1 | 1 | 1
1 | 1 | 1 | 1
1 | 1 | 1 | 1
(3 rows)

Then select * from j1 j1o where im5 = (select avg(im5) from j1 where im5 =
j1o.im5) and i = 1;
will hit our above optimizations. The plan is

QUERY PLAN
-----------------------------------------------
Index Scan using j1_idx1 on j1 j1o
Index Cond: (i = 1)
Filter: ((im5)::numeric < (SubPlan 1))
SubPlan 1
-> Materialize
-> Aggregate
-> Seq Scan on j1
Filter: (im5 = j1o.im5)
(8 rows)

and the Aggregate is just executed once (execution time dropped from 8.x s
to 2.6s).

----
The attached is a very PoC patch, but it can represent my idea for
current discuss, Some notes about the implementation.

1. We need to check if the input params is really not changed. Currently
I just
comment it out for quick test.

- planstate->chgParam = bms_add_member(planstate->chgParam,
paramid);
+ // planstate->chgParam =
bms_add_member(planstate->chgParam, paramid);

Looks we have a lot of places to add a params
to chgParam without checking the actual value. The place I found this case
is
during ExecNestLoop. So we may need a handy and efficient way to do the
check for all the places. However it is not a must for current case

2. I probably misunderstand the the usage of MaterialState->eflags.
since I don't
know why the eflag need to be checked ExecMaterial. and I have to remove
it to
let my PoC work.

- if (tuplestorestate == NULL && node->eflags != 0)
+ if (tuplestorestate == NULL)

3. I added the material path in a very hacked way, the if check just to
make
sure it take effect on my test statement only. If you want to test this
patch locally,
you need to change the oid for your case.

+ if (linitial_node(RangeTblEntry, root->parse->rtable)->relid ==
25634)
+ best_path = (Path *) create_material_path(final_rel,
best_path);

But when we take this action to production case, how to cost this strategy
is
challenge since it can neither reduce the total_cost nor result in a new
PathKey.
I will check other place to see how this kind can be added.

Best Regards
Andy Fan

Attachment Content-Type Size
test.sql application/octet-stream 287 bytes
v1-0001-Add-a-Material-Path-for-subplan-and-reused-the-pr.patch application/octet-stream 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-04-24 03:43:51 Re: [BUG] non archived WAL removed during production crash recovery
Previous Message Atsushi Torikoshi 2020-04-24 02:56:01 Re: [Proposal] Add accumulated statistics for wait event