Re: A performance issue with Memoize

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Richard Guo <guofenglinux(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A performance issue with Memoize
Date: 2024-02-01 07:43:14
Message-ID: CAO6_XqrYGoMJH1UmETZ+Y==uppQBsuEkiPdhZxeBOj61R=c1hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've seen a similar issue with the following query (tested on the current head):

EXPLAIN ANALYZE SELECT * FROM tenk1 t1
LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=8.46..1495.10 rows=50000 width=256)
(actual time=0.860..111.013 rows=50000 loops=1)
-> Bitmap Heap Scan on tenk1 t1 (cost=8.16..376.77 rows=500
width=244) (actual time=0.798..1.418 rows=500 loops=1)
Recheck Cond: (hundred < 5)
Heap Blocks: exact=263
-> Bitmap Index Scan on tenk1_hundred (cost=0.00..8.04
rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
Index Cond: (hundred < 5)
-> Memoize (cost=0.30..4.89 rows=100 width=12) (actual
time=0.009..0.180 rows=100 loops=500)
Cache Key: t1.hundred
Cache Mode: logical
Hits: 0 Misses: 500 Evictions: 499 Overflows: 0 Memory Usage: 5kB
-> Index Scan using tenk2_hundred on tenk2 (cost=0.29..4.88
rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
Index Cond: (hundred = t1.hundred)
Planning Time: 0.661 ms
Execution Time: 113.076 ms
(14 rows)

The memoize's cache key only uses t1.hundred while the nested loop has
two changed parameters: the lateral var t1.two and t1.hundred. This
leads to a chgParam that is always different and the cache is purged
on each rescan.

Regards,
Anthonin

On Sat, Jan 27, 2024 at 5:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> Hello,
>
> 27.01.2024 00:09, Tom Lane wrote:
> > David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> >> On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> drongo and fairywren are consistently failing the test case added
> >>> by this commit. I'm not quite sure why the behavior of Memoize
> >>> would be platform-specific when we're dealing with integers,
> >>> but ...
> >> Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
> >> rather than "2"?
> >> Looking at the code in fmtfloat(), we fallback on the built-in snprintf.
> > Maybe ... I don't have a better theory.
>
> FWIW, I've found where this behaviour is documented:
> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170
>
> (I've remembered a case with test/sql/partition_prune from 2020, where
> sprintf on Windows worked the other way.)
>
>
> Best regards,
> Alexander
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-02-01 08:33:37 Re: Fix bugs not to discard statistics when changing stats_fetch_consistency
Previous Message Amul Sul 2024-02-01 07:17:27 Re: Add system identifier to backup manifest