Optimize common expressions in projection evaluation

From: Peifeng Qiu <pgsql(at)qiupf(dot)dev>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Optimize common expressions in projection evaluation
Date: 2022-12-02 07:39:07
Message-ID: CAPH51beXmxKzjS2mF_ifrBT1UY=JcWbU9RHZWwpnfhwiaCeGZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers.

When a star(*) expands into multiple fields, our current
implementation is to generate multiple copies of the expression
and do FieldSelects. This is very inefficient because the same
expression get evaluated multiple times but actually we only need
to do it once. This is stated in ExpandRowReference().

For example:
CREATE TABLE tbl(c1 int, c2 int, c3 int);
CREATE TABLE src(v text);
CREATE FUNCTION expensive_func(input text) RETURNS t;
INSERT INTO tbl SELECT (expensive_func(v)).* FROM src;

This is effectively the same as:
INSERT INTO tbl SELECT (expensive_func(v)).c1,
(expensive_func(v)).c2, (expensive_func(v)).c3 FROM src;

In this form, expensive_func will be evaluated for every column in
tbl. If tbl has hundreds of columns we are in trouble. To partially
solve this issue, when doing projection in ExecBuildProjectionInfo,
instead of generating normal steps for FieldSelects one by one, we
can group them by the expression(arg of FieldSelect node). Then
evaluate the epxression once to get a HeapTuple, deform it into
fields, and then assign needed fields in one step. I've attached
patch that introduce EEOP_FIELD_MULTI_SELECT_ASSIGN for this.

With this patch, the following query should generate only one
NOTICE, instead of 3.

CREATE TYPE proj_type AS (a int, b int, c text);
CREATE OR REPLACE FUNCTION proj_type_func1(input text)
RETURNS proj_type AS $$
BEGIN
RAISE NOTICE 'proj_type_func called';
RETURN ROW(1, 2, input);
END
$$ IMMUTABLE LANGUAGE PLPGSQL;
CREATE TEMP TABLE stage_table(a text);
INSERT INTO stage_table VALUES('aaaa');
SELECT (proj_type_func1(a)).* FROM stage_table;

This patch is just proof of concept. Some unsolved questions I
can think of right now:
- Carry some information in FieldSelect from ExpandRowReference
to assist grouping?
- This can only handle FuncExpr as the root node of FieldSelect
arg. What about a more general expression?
- How to determine whether a common expression is safe to be
optimized this way? Any unexpcted side-effects?

Any thoughts on this approach?

Best regards,
Peifeng Qiu

Attachment Content-Type Size
0001-Optimize-common-expressions-in-projection-evaluation.patch text/x-patch 15.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2022-12-02 07:41:11 Re: ExecRTCheckPerms() and many prunable partitions
Previous Message Drouvot, Bertrand 2022-12-02 07:36:38 Re: Introduce a new view for checkpointer related stats