Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Lukas Fittl <lukas(at)fittl(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Date: 2025-03-20 12:04:43
Message-ID: fbf77241-e4c5-4f70-ac71-8bf8cff89f39@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 20.03.2025 13:37, David Rowley wrote:
> est_entries is a uint32, so %u is the correct format character for that type.
>
> I don't think you need to prefix all these properties with "Cache"
> just because the other two properties have that prefix. I also don't
> think the names you've chosen really reflect the meaning. How about
> something like: "Estimated Distinct Lookup Keys: 721 Estimated
> Capacity: 655", in that order. I think maybe having that as one line
> for format=text is better than 2 lines. The EXPLAIN output is already
> often taking up more lines in v18 than in v17, would be good to not
> make that even worse unnecessarily.

LGTM

> I see the existing code there could use ExplainPropertyText rather
> than have a special case for text and non-text formats. That's likely
> my fault. If we're touching this code, then we should probably tidy
> that up. Do you want to create a precursor fixup patch for that?

I fully agree with this suggestion. Then I'll begin with this on another
new thread.

> + double ndistinct; /* Estimated number of distinct memoization keys,
> + * used for cache size evaluation. Kept for EXPLAIN */
>
> Maybe this field in MemoizePath needs a better name. How about
> "est_unique_keys"? and also do the rename in struct Memoize.

LGTM

> I'm also slightly concerned about making struct Memoize bigger. I had
> issues with a performance regression [1] for 908a96861 when increasing
> the WindowAgg struct size last year and the only way I found to make
> it go away was to shuffle the fields around so that the struct size
> didn't increase. I think we'll need to see a benchmark of a query that
> hits Memoize quite hard with a small cache size to see if the
> performance decreases as a result of adding the ndistinct field. It's
> unfortunate that we'll not have the luxury of squeezing this double
> into padding if we do see a slowdown.

I tried rearranging the fields in the structure with 'ndistinct' field,
but unfortunately, sizeof(Memoize) did not remain the same. Therefore,
benchmarking Memoize is definitely necessary. Thanks for the information!

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-03-20 12:19:57 Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
Previous Message Peter Eisentraut 2025-03-20 11:59:29 Re: Index AM API cleanup