Re: Check lateral references within PHVs for memoize cache keys

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Check lateral references within PHVs for memoize cache keys
Date: 2023-12-25 07:01:51
Message-ID: CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun+Vtrowcmxb41CzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 13, 2023 at 3:12 PM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:

> So I'm wondering if it'd be better that we move all this logic of
> computing additional lateral references within PHVs to get_memoize_path,
> where we can examine only PHVs that are evaluated at innerrel. And
> considering that these lateral refs are only used by Memoize, it seems
> more sensible to compute them there. But I'm a little worried that
> doing this would make get_memoize_path too expensive.
>
> Please see v4 patch for this change.
>

I'd like to add that not checking PHVs for lateral references can lead
to performance regressions with Memoize node. For instance,

-- by default, enable_memoize is on
regression=# explain (analyze, costs off) select * from tenk1 t1 left join
lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
-------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.028..105245.547 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.011..3.760 rows=10000 loops=1)
-> Memoize (actual time=0.010..8.051 rows=5000 loops=10000)
Cache Key: t1.two
Cache Mode: logical
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory
Usage: 1368kB
-> Seq Scan on tenk1 t2 (actual time=0.004..3.594 rows=5000
loops=10000)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 1.943 ms
Execution Time: 106806.043 ms
(11 rows)

-- turn enable_memoize off
regression=# set enable_memoize to off;
SET
regression=# explain (analyze, costs off) select * from tenk1 t1 left join
lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.048..44831.707 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..2.340 rows=10000 loops=1)
-> Seq Scan on tenk1 t2 (actual time=0.002..3.282 rows=5000 loops=10000)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 0.641 ms
Execution Time: 46472.609 ms
(7 rows)

As we can see, when Memoize enabled (which is the default setting), the
execution time increases by around 129.83%, indicating a significant
performance regression.

This is caused by that we fail to realize that 't1.four', which is from
the PHV, should be included in the cache keys. And that makes us have
to purge the entire cache every time we get a new outer tuple. This is
also implied by the abnormal Memoize runtime stats:

Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0

This regression can be fixed by the patch here. After applying the v4
patch, 't1.four' is added into the cache keys, and the same query runs
much faster.

regression=# explain (analyze, costs off) select * from tenk1 t1 left join
lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
---------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.060..20446.004 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..5.845 rows=10000 loops=1)
-> Memoize (actual time=0.001..0.209 rows=5000 loops=10000)
Cache Key: t1.two, t1.four
Cache Mode: binary
Hits: 9996 Misses: 4 Evictions: 0 Overflows: 0 Memory Usage:
5470kB
-> Seq Scan on tenk1 t2 (actual time=0.005..3.659 rows=5000
loops=4)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 0.579 ms
Execution Time: 21756.598 ms
(11 rows)

Comparing the first plan and the third plan, this query runs ~5 times
faster.

Thanks
Richard

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2023-12-25 08:07:28 Re: pg_basebackup has an accidentaly separated help message
Previous Message Michael Paquier 2023-12-25 06:42:41 Re: pg_basebackup has an accidentaly separated help message