From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Greg Sabino Mullane <htamfids(at)gmail(dot)com>, Robert Treat <rob(at)xzilla(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Should we update the random_page_cost default value? |
Date: | 2025-10-08 19:25:53 |
Message-ID: | 8346d349-92b7-4725-8c07-8eb68b9d58c8@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/8/25 19:23, Robert Haas wrote:
> On Wed, Oct 8, 2025 at 12:24 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> I don't think there's all that much disagreement, actually. This is a
>> pretty good illustration that we're using random_page_cost to account
>> for things other than "I/O cost" (like the expected cache hit ratios),
>> because we simply don't have a better knob for that.
>
> I agree with that conclusion.
>
>> Isn't this somewhat what effective_cache_size was meant to do? That
>> obviously does not know about what fraction of individual tables is
>> cached, but it does impose size limit.
>
> Not really, because effective_cache_size only models the fact that
> when you iterate the same index scan within the execution of a single
> query, it will probably hit some pages more than once. It doesn't have
> any idea that anything other than an index scan might hit the same
> pages more than once, and it doesn't have any idea that a query might
> find data in cache as a result of previous queries. Also, when it
> thinks the same page is accessed more than once, the cost of
> subsequent accesses is 0.
>
> I could be wrong, but I kind of doubt that there is any future in
> trying to generalize effective_cache_size. It's an extremely
> special-purpose mechanism, and what we need here is more of a general
> approach that can cut across the whole planner -- or alternatively we
> can decide that things are fine and that having rpc/spc implicitly
> model caching behavior is good enough.
>
>> I think in the past we mostly assumed we can't track cache size per
>> table, because we have no visibility into page cache. But maybe direct
>> I/O would change this?
>
> I think it's probably going to work out really poorly to try to use
> cache contents for planning. The plan may easily last much longer than
> the cache contents.
>
Why wouldn't that trigger invalidations / replanning just like other
types of stats? I imagine we'd regularly collect stats about what's
cached, etc. and we'd invalidate stale plans just like after ANALYZE.
Just a random idea, though. We'd need some sort of summary anyway, it's
not plausible each backend would collect this info on it's own.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-10-08 19:37:58 | Re: Should we update the random_page_cost default value? |
Previous Message | Ranier Vilela | 2025-10-08 19:24:08 | Re: [PATCH] Add tests for Bitmapset |