A performance regression issue with Memoize

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: A performance regression issue with Memoize
Date: 2025-07-28 08:13:45
Message-ID: CAMbWs4_9c=D=3ZCFUyG3RuXhRx5SwyXicZNKxjhNHjdPzfdBzg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I ran into a query that shows a performance regression related to the
Memoize node.

create table t (a int, b int, c int);
insert into t select i%3, i, i from generate_series(1,500)i;
analyze t;

explain (analyze, costs off, timing off)
select * from t t1 join lateral
(select t2.a, t2.b, t1.a x from t t2, t t3 offset 0) s
on s.a = t1.a;

with enable_memoize set to on:

Planning Time: 2.470 ms
Execution Time: 98869.240 ms

with enable_memoize set to off:

Planning Time: 1.791 ms
Execution Time: 55754.080 ms

This shows a 77.3% performance regression with Memoize enabled.

The stats of the Memoize node shows some clues:

-> Memoize (actual rows=83334.00 loops=500)
Cache Key: t1.a
Cache Mode: binary
Hits: 0 Misses: 500 Evictions: 498 Overflows: 0 Memory
Usage: 8193kB

There are 0 cache hits, and too many cache evictions.

So I suspect that during the phase of filling the Memoize cache, the
memory usage exceeds the specified limit, causing cache entries to be
repeatedly evicted.

While cost_memoize_rescan() does account for the eviction ratio when
estimating the cost of Memoize, the estimate does not seem to be
accurate enough in this case to prevent the planner from choosing a
Memoize node.

Any thoughts on how we might improve this?

Thanks
Richard

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jean-Christophe Arnu 2025-07-28 08:23:07 Re: restore_command return code behaviour
Previous Message Jean-Christophe Arnu 2025-07-28 08:01:41 Re: restore_command return code behaviour