Materialiation is slower than non-materialized

From: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Materialiation is slower than non-materialized
Date: 2015-03-23 10:01:40
Message-ID: BF2827DCCE55594C8D7A8F7FFD3AB77159876742@szxeml521-mbs.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

During my routine work, I observed that incase of execution of plan having inner node of NLJ as materialized node (on top of SeqScan) is slower compared to non-materialized SeqScan node. This happens only if "Work_mem is not big enough to hold all tuples in memory."

To make test easy and faster, I set the work_mem as 256kB. Then result is as below:

=========With Material off=============
postgres=# set enable_material to off;
SET
Time: 0.225 ms
postgres=# select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;
count
----------
49995000
(1 row)

Time: 26674.299 ms
postgres=# explain select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=2783478.33..2783478.34 rows=1 width=4)
-> Nested Loop (cost=0.00..2700145.00 rows=33333333 width=4)
Join Filter: (tbl.id1 < tbl2.id1)
-> Seq Scan on tbl (cost=0.00..145.00 rows=10000 width=4)
-> Seq Scan on tbl2 (cost=0.00..145.00 rows=10000 width=4)
Planning time: 0.120 ms
(6 rows)

=========With Material on=============

postgres=# set enable_material to on;
SET
Time: 0.222 ms
postgres=# select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;
count
----------
49995000
(1 row)

Time: 32839.627 ms
postgres=# explain select count(tbl.id1) from tbl, tbl2 where tbl.id1<tbl2.id1;
QUERY PLAN
----------------------------------------------------------------------------
Aggregate (cost=1983648.33..1983648.34 rows=1 width=4)
-> Nested Loop (cost=0.00..1900315.00 rows=33333333 width=4)
Join Filter: (tbl.id1 < tbl2.id1)
-> Seq Scan on tbl (cost=0.00..145.00 rows=10000 width=4)
-> Materialize (cost=0.00..235.00 rows=10000 width=4)
-> Seq Scan on tbl2 (cost=0.00..145.00 rows=10000 width=4)
Planning time: 0.140 ms
(7 rows)

As per my analysis, above result is aligned with our current design.

Materialization Node:
Cost Calculation @ Plan time:
If the results spills over to disk in case of Materialization, it considers the cost for the same in total cost.
Actual Execution:
Result is actually fetched from disk only even on re-scan.

Scan Node:
Cost Calculation @ Plan time:
The cost of re-scan of SeqScan node is considered to be same scan of SeqScan node, which always assumes that the records is fetched from disk and hence disk access cost is added (As we don't know really how much memory will be available to cache during execution).
Actual Execution:
After first scan, once the whole records is loaded to memory (provided shared_buffer is big enough), rescan of records are read from memory only and hence it is much faster.

So because of this while planning cost of Materialized node is lesser than that of SeqScan node but while execution SeqScan is faster because it fetches tuples from memory on re-scan.

I am not sure if we can consider this to be a problem or not but I just wanted to share as generally it is expected by user to be Materialization faster than Non-materialized.
Please provide your opinion. If we can do something about this then I can take up this work.

Thanks and Regards,
Kumar Rajeev Rastogi
------------------------------------------------------------------------------------------------------------------------------
This e-mail and its attachments contain confidential information from HUAWEI, which
is intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-03-23 11:50:40 Exposing PG_VERSION_NUM in pg_config
Previous Message Peter Geoghegan 2015-03-23 06:51:30 Re: Display of multi-target-table Modify plan nodes in EXPLAIN