Re: Delay Memoize hashtable build until executor run

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Delay Memoize hashtable build until executor run
Date: 2024-01-29 23:25:02
Message-ID: CAApHDvp+5gdgc4Znjc3AT2J_Q6py1MjPQ=kPjMH1KaL5PVnZUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 26 Jan 2024 at 19:54, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> Currently, nodeMemoize.c builds the hashtable for the cache during
> executor startup. This is not what is done in hash joins. I think we
> should make the two behave the same way.

I ran a few benchmarks on this, mostly for archive purposes.

-- Test 1: Demonstrate there is a problem

drop table if exists t,r;
create table t (a int);
create table r (a int primary key);
insert into t select x%5000000 from generate_Series(1,20000000)x;
insert into r select x from generate_Series(0,4999999)x;
vacuum analyze t,r;
set work_mem='1GB';
set enable_hashjoin=0;
set enable_mergejoin=0;
set max_parallel_workers_per_gather=0;
\timing on

explain (summary on) select count(*) from t inner join r on t.a=r.a;

set enable_memoize=1;

-- I'm including Planning Time just to show that the extra time is not
spent in planning
Planning Time: 0.094 ms -> Time: 53.061 ms
Planning Time: 0.093 ms -> Time: 53.064 ms
Planning Time: 0.095 ms -> Time: 69.682 ms

set enable_memoize=0;

Planning Time: 0.113 ms -> Time: 0.438 ms
Planning Time: 0.111 ms -> Time: 0.436 ms
Planning Time: 0.113 ms -> Time: 0.445 ms

Conclusion: There's a problem

-- Patched with memoize on
Planning Time: 0.116 ms -> Time: 0.472 ms
Planning Time: 0.118 ms -> Time: 0.444 ms
Planning Time: 0.117 ms -> Time: 0.443 ms

Conclusion: The patch fixes the problem

-- Test 2: Make sure we're not slowing things down by checking the
table exists each tuple

drop table if exists t,r;

create table t (a int);
create table r (a int primary key);

insert into t select 1 from generate_series(1,1000000);
insert into r select x from generate_series(1,1000000)x;
vacuum analyze t,r;

set enable_hashjoin=0;
set enable_mergejoin=0;
set enable_memoize=1;
set max_parallel_workers_per_gather=0;

-- only 1 cache miss so that we hammer the cache hit code as hard as we can
-- with the smallest hash table possible so lookups are very fast.
explain (analyze, timing off) select count(*) from t inner join r on t.a=r.a;

-- Master
Execution Time: 206.403 ms
Execution Time: 211.472 ms
Execution Time: 204.688 ms

-- Patched
Execution Time: 205.967 ms
Execution Time: 206.406 ms
Execution Time: 205.061 ms

Conclusion: No slowdown.

I'll push this change to master only as there don't seem to have been
any complaints. We can reconsider that if someone complains.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-01-29 23:36:16 Re: Make documentation builds reproducible
Previous Message Tatsuo Ishii 2024-01-29 23:22:57 When extended query protocol ends?