Re: Should we update the random_page_cost default value?

From: Jim Nasby <jnasby(at)upgrade(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should we update the random_page_cost default value?
Date: 2025-10-31 19:12:18
Message-ID: CAMFBP2omp3v6pcThHJnW7zjYak6Dqob-__QY-OLuJfqNPyY6fA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I definitely agree with Tom's idea that the costing model needs a re-think,
if for no other reason than the volume of discussion on this thread that
(AFAICT) has reached precious few conclusions. Along those lines, the two
biggest problems I see with the current state of affairs are:

1) We blindly take the lowest cost query, regardless of how close #2 is.
More importantly, we do nothing to model how accurate the cost estimate is.
In the realm of OLTP, stability is FAR, *FAR* more important than absolute
efficiency. Short of forcing specific query plans, the next best thing for
stability would be to consider the estimated accuracy of estimates: I'd
much rather have a plan with a 2x higher estimate but 90% confidence than a
plan that's half the cost but with only 10% confidence.

When it comes to OLAP, it might be useful to have the ability to start
executing several different query plans, and monitor how well reality
matches the cost estimates (though dealing with nodes that have to process
all data before producing tuples would be a bit of a challenge). If a
query's going to run for 30 minutes, letting a few copies of it run for 30
seconds to evaluate things could well be a fruitful tradeoff.

2) We have essentially no ability to actually measure how well things are
working. The thread has shown that even building controlled tests is
difficult, given all the moving parts in the executor that we have little
visibility into (just one example: the actual cost of deforming tuples).
Getting useful data from user's systems is at least as difficult: even an
EXPLAIN (ANALYZE, VERBOSE, BUFFERS) doesn't tell you anything about how the
planner decided on a plan, or what the alternatives looked like (as well as
missing the data you'd need to know if cost coefficients were anywhere
close to reasonable).

Speaking of cost coefficients... I'm kinda doubtful of any effort to change
the default of random_page_cost that doesn't also look at the cpu_* GUCs.
Even assuming sequential access is 0.01ms, the default cpu_tuple_cost of
0.01 means 100us to process a tuple. Even if that's reasonable, it also
means 25us for cpu_operator_cost, which seems pretty high even for text
comparisons (and is obviously absurd for int or float).

Given the pain involved with trying to manage all these coefficients I do
wonder if there's any possible way to determine them empirically on any
given system, perhaps using clock cycle counts. That could potentially help
with #2 (and might also allow estimating what percent of IO is coming from
a cache outside of shared buffers, as opposed to real IO).

On Fri, Oct 31, 2025 at 10:41 AM Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> I thinking more about how we can improve random_page_cost, I now realize
> that we will never perfectly model the caching effects, and even if we
> did, the possible time gap between planning and execution would rander
> it unreliable anyway, i.e., prepared queries.
>
> I think the best we could do is to model how _volatile_ the cache is,
> meaning how often are we removing and adding items to the shared buffer
> cache, and somehow mix that to the random_page_cost value. That value
> might be fairly stable over a longer period of time.
>

In my experience on OLTP systems, that volatility is effectively quite low,
because these systems generally start behaving quite poorly as the critical
"working set" of data starts to exceed cache (shared_buffers + filesystem).
That certainly may not be true at the level of shared_buffers, and of
course over time the specific pages living in cache will slowly change.

One important caveat to that though: it only applies to the (usually rather
small) set of queries that are run all the time by the application.
Infrequently run queries can have radically different behavior, but their
performance is generally not very critical (as long as they don't impact
the rest of the system).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-10-31 19:22:27 Re: Should HashSetOp go away
Previous Message Matheus Alcantara 2025-10-31 19:02:05 Re: postgres_fdw: Use COPY to speed up batch inserts