Quick Links

Re: Delay Memoize hashtable build until executor run

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Delay Memoize hashtable build until executor run
Date:	2024-01-29 23:25:02
Message-ID:	CAApHDvp+5gdgc4Znjc3AT2J_Q6py1MjPQ=kPjMH1KaL5PVnZUQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, 26 Jan 2024 at 19:54, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Currently, nodeMemoize.c builds the hashtable for the cache during
> executor startup. This is not what is done in hash joins. I think we
> should make the two behave the same way.

I ran a few benchmarks on this, mostly for archive purposes.

-- Test 1: Demonstrate there is a problem

drop table if exists t,r;
create table t (a int);
create table r (a int primary key);
insert into t select x%5000000 from generate_Series(1,20000000)x;
insert into r select x from generate_Series(0,4999999)x;
vacuum analyze t,r;
set work_mem='1GB';
set enable_hashjoin=0;
set enable_mergejoin=0;
set max_parallel_workers_per_gather=0;
\timing on

explain (summary on) select count(*) from t inner join r on t.a=r.a;

set enable_memoize=1;

-- I'm including Planning Time just to show that the extra time is not
spent in planning
Planning Time: 0.094 ms -> Time: 53.061 ms
Planning Time: 0.093 ms -> Time: 53.064 ms
Planning Time: 0.095 ms -> Time: 69.682 ms

set enable_memoize=0;

Planning Time: 0.113 ms -> Time: 0.438 ms
Planning Time: 0.111 ms -> Time: 0.436 ms
Planning Time: 0.113 ms -> Time: 0.445 ms

Conclusion: There's a problem

-- Patched with memoize on
Planning Time: 0.116 ms -> Time: 0.472 ms
Planning Time: 0.118 ms -> Time: 0.444 ms
Planning Time: 0.117 ms -> Time: 0.443 ms

Conclusion: The patch fixes the problem

-- Test 2: Make sure we're not slowing things down by checking the
table exists each tuple

drop table if exists t,r;

create table t (a int);
create table r (a int primary key);

insert into t select 1 from generate_series(1,1000000);
insert into r select x from generate_series(1,1000000)x;
vacuum analyze t,r;

set enable_hashjoin=0;
set enable_mergejoin=0;
set enable_memoize=1;
set max_parallel_workers_per_gather=0;

-- only 1 cache miss so that we hammer the cache hit code as hard as we can
-- with the smallest hash table possible so lookups are very fast.
explain (analyze, timing off) select count(*) from t inner join r on t.a=r.a;

-- Master
Execution Time: 206.403 ms
Execution Time: 211.472 ms
Execution Time: 204.688 ms

-- Patched
Execution Time: 205.967 ms
Execution Time: 206.406 ms
Execution Time: 205.061 ms

Conclusion: No slowdown.

I'll push this change to master only as there don't seem to have been
any complaints. We can reconsider that if someone complains.

David

In response to

Delay Memoize hashtable build until executor run at 2024-01-26 06:54:16 from David Rowley

Responses

Re: Delay Memoize hashtable build until executor run at 2024-01-29 23:37:51 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2024-01-29 23:36:16	Re: Make documentation builds reproducible
Previous Message	Tatsuo Ishii	2024-01-29 23:22:57	When extended query protocol ends?