Re: A performance issue with Memoize

From: Richard Guo <guofenglinux(at)gmail(dot)com>
To: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A performance issue with Memoize
Date: 2024-02-01 08:39:33
Message-ID: CAMbWs48ePZupHo01wAek9Qv+naf0ZY+CUF0fHW1OHXrXL_fMzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 1, 2024 at 3:43 PM Anthonin Bonnefoy <
anthonin(dot)bonnefoy(at)datadoghq(dot)com> wrote:

> Hi,
>
> I've seen a similar issue with the following query (tested on the current
> head):
>
> EXPLAIN ANALYZE SELECT * FROM tenk1 t1
> LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
> ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop Left Join (cost=8.46..1495.10 rows=50000 width=256)
> (actual time=0.860..111.013 rows=50000 loops=1)
> -> Bitmap Heap Scan on tenk1 t1 (cost=8.16..376.77 rows=500
> width=244) (actual time=0.798..1.418 rows=500 loops=1)
> Recheck Cond: (hundred < 5)
> Heap Blocks: exact=263
> -> Bitmap Index Scan on tenk1_hundred (cost=0.00..8.04
> rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
> Index Cond: (hundred < 5)
> -> Memoize (cost=0.30..4.89 rows=100 width=12) (actual
> time=0.009..0.180 rows=100 loops=500)
> Cache Key: t1.hundred
> Cache Mode: logical
> Hits: 0 Misses: 500 Evictions: 499 Overflows: 0 Memory Usage:
> 5kB
> -> Index Scan using tenk2_hundred on tenk2 (cost=0.29..4.88
> rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
> Index Cond: (hundred = t1.hundred)
> Planning Time: 0.661 ms
> Execution Time: 113.076 ms
> (14 rows)
>
> The memoize's cache key only uses t1.hundred while the nested loop has
> two changed parameters: the lateral var t1.two and t1.hundred. This
> leads to a chgParam that is always different and the cache is purged
> on each rescan.

Thanks for the report! This issue is caused by that we fail to check
PHVs for lateral references, and hence cannot know that t1.two should
also be included in the cache keys.

I reported exactly the same issue in [1], and verified that it can be
fixed by the patch in [2] (which seems to require a rebase).

[1]
https://www.postgresql.org/message-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun%2BVtrowcmxb41CzA%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CAMbWs49%2BCjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg%40mail.gmail.com

Thanks
Richard

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-02-01 09:03:28 Re: set_cheapest without checking pathlist
Previous Message Michael Paquier 2024-02-01 08:33:37 Re: Fix bugs not to discard statistics when changing stats_fetch_consistency