Expression push down from Join Node to below node.

From: Shubhankar Anand Kulkarni <shubhankar(dot)ak(at)zohocorp(dot)com>
To: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Cc: "shubhankarkul00" <shubhankarkul00(at)gmail(dot)com>
Subject: Expression push down from Join Node to below node.
Date: 2025-05-22 08:01:08
Message-ID: 196f702a715.7308025321701.1672670818480320441@zohocorp.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hackers,
 
While dealing with a few queries, I noticed that when the join expression (join clause) is used in projection as well, the expression will be computed twice.
For a better understanding, please take reference from the following example:

SELECT sensitive_data1, column1 FROM benchmark_encytion AS t1 LEFT JOIN ( SELECT aes256_cbc_decrypt( c1, '\x1234' :: bytea, '\x5678' :: bytea ) AS column1 FROM cipher ) AS t2 ON t1.sensitive_data1 = t2.column1;

As you can see in the above Query, the join clause involves the column which needs to be decrypted first and then joined with other table and then in projection we need the needed decrypted values to print as well, in this case the plan generated by the PG is as mentioned below (refer to the image as well):

                                                                                                        QUERY PLAN                                                                                                        

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Hash Right Join  (cost=22.74..73.43 rows=1 width=65)

   Output: t1.sensitive_data1, aes256_cbc_decrypt(cipher.c1, '4696e67'::bytea, '6e67'::bytea)

   Hash Cond: (aes256_cbc_decrypt(cipher.c1, '4696e67'::bytea, '6e67'::bytea) = t1.sensitive_data1)

   ->  Foreign Scan on public.cipher  (cost=0.00..50.68 rows=1 width=49)

         Output: cipher.c1, cipher.c2, cipher.c3, cipher.c4, cipher.c5, cipher.c6

         CStore Dir: /home/shubha/Documents/zoho/postgres17/data/cstore_fdw/116838/116931

         CStore Table Size: 2424 kB

   ->  Hash  (cost=22.72..22.72 rows=1 width=33)

         Output: t1.sensitive_data1

         ->  Foreign Scan on public.benchmark_encytion t1  (cost=0.00..22.72 rows=1 width=33)

               Output: t1.sensitive_data1

               CStore Dir: /home/shubha/Documents/zoho/postgres17/data/cstore_fdw/116838/135230

               CStore Table Size: 1268 kB

Query Identifier: 1810637692808683603

(14 rows)

As seen in the plan, join clause uses aes256_cbc_decrypt funcExpr to join columns and we are selecting the same as projection from hasjJoin node resulting in computing the expr twice, which is very costly.
 
My doubt here is, while planing this join, why can't we parse the join clause and pass the expressions involved there to the respective scan nodes and use it above that wherever needed as a Var? 
In this particular case, we can push_down the expression (decrypt funcExpr) from join clause to the foreign scan of cipher table. Why have we not handled this case in PG?
Pls share your thoughts on the same, also pls correct me if my understanding is wrong here.

Thanks and Regards.
Shubhankar  Kulkarni
ZLabs-CStore

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2025-05-22 08:12:24 Re: generic plans and "initial" pruning
Previous Message Daniil Davydov 2025-05-22 07:43:57 Re: POC: Parallel processing of indexes in autovacuum