From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Sabino Mullane <htamfids(at)gmail(dot)com> |
Cc: | Robert Treat <rob(at)xzilla(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Should we update the random_page_cost default value? |
Date: | 2025-10-08 16:24:33 |
Message-ID: | 46307eb5-f691-4251-b71e-f33bcf5e5375@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 10/8/25 17:20, Robert Haas wrote:
> On Tue, Oct 7, 2025 at 3:15 PM Greg Sabino Mullane <htamfids(at)gmail(dot)com> wrote:
>> I've been doing this sort of thing for clients a long time, and I always test both directions when I come across a query that should be faster. For real-world queries, 99% of them have no change or improve with a lowered rpc, and 99% get worse via a raised rpc. So color me unconvinced.
>
> Color me equally unconvinced. Tomas seems quite convinced that we
> ought to be raising random_page_cost rather than lowering it, and that
> absolutely does not correspond to my experience in any way. It's not
> that I've actually tested raising the random page cost on very many
> systems, mind you. It's that my experience with real-world systems is
> that we tend to pick sequential scans when we should use an index, not
> the other way around. And obviously that problem will never be fixed
> by raising random_page_cost, since that will tend to favor sequential
> scans even more.
>
> Tomas's test involves scanning big tables that don't fit in RAM, and I
> agree that if that's most of what you do, you might benefit from a
> higher random_page_cost. However, even users who have some tables that
> are a lot bigger than RAM also tend to have frequently-accessed tables
> that are much smaller. For example, if you join a fact table to a
> bunch of dimension tables, the fact table may not fit in RAM, but the
> dimension tables probably do, and we're using the same
> random_page_cost for all of those tables. Moreover, we're least likely
> to make a wrong decision about what to do about the big table. It will
> often be the case that the query will be phrased such that a
> sequential scan on the big table is unavoidable, and if it is possible
> to avoid a sequential scan, we're going to want to do so almost
> regardless of random_page_cost because the number of pages accesses we
> can save will tend to be large. On a smaller table, it's more likely
> that both a full table scan and an index-based approach will be
> competitive, so there the value of random_page_cost will matter more
> to the final outcome. So, in this scenario, it's more important that
> the random_page_cost is close to accurate for the smaller tables
> rather than for the larger ones, and those are the ones that are most
> likely to benefit from caching.
>
I don't think there's all that much disagreement, actually. This is a
pretty good illustration that we're using random_page_cost to account
for things other than "I/O cost" (like the expected cache hit ratios),
because we simply don't have a better knob for that.
> One of the planner changes that I think would be worth exploring is to
> have the system try to estimate the percentage of a given table that
> is likely to be in cache, and that could be configured via a reloption
> or estimated based on the size of the table (or maybe even the
> frequency of access, though that is fraught, since it can change
> precipitously on short time scales and is thus not great to use for
> planning). If we estimated that small tables are likely to be cached
> and bigger ones are likely to be less-cached or, if very big,
> completely uncached, then it would probably be right to raise
> random_page_cost as well. But without that kind of a change, the
> correct value of random_page_cost is the one that takes into account
> both the possibility of caching and the possibility of nothing being
> cached, and this test is rotated all the way toward one end of that
> spectrum.
>
Isn't this somewhat what effective_cache_size was meant to do? That
obviously does not know about what fraction of individual tables is
cached, but it does impose size limit.
I think in the past we mostly assumed we can't track cache size per
table, because we have no visibility into page cache. But maybe direct
I/O would change this?
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2025-10-08 16:33:40 | Re: VACUUM (PARALLEL) option processing not using DefElem the way it was intended |
Previous Message | Tomas Vondra | 2025-10-08 16:17:50 | Re: Should we update the random_page_cost default value? |