Re: Should we update the random_page_cost default value?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Sabino Mullane <htamfids(at)gmail(dot)com>
Cc: Robert Treat <rob(at)xzilla(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should we update the random_page_cost default value?
Date: 2025-10-07 19:46:28
Message-ID: 5qum3e5rzblbuw4m5cb7g6l5d6y2gip4ltz6pouutqkljc645r@ibvg5dygvqgl
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-10-07 15:20:37 -0400, Greg Sabino Mullane wrote:
> On Tue, Oct 7, 2025 at 3:15 PM Greg Sabino Mullane <htamfids(at)gmail(dot)com>
> wrote:
>
> > One of the take-away lessons from this thread for me is that the TPC-*
> >> benchmarks are far removed from real world queries. (Maybe if we ask an LLM
> >> to use an ORM to implement TPC-H? Ha ha ha!)
> >
> >
> To be clear, I'm saying that TPC queries are written by sane adults that
> know what they are doing, but perhaps lowering rpc tends to help more when
> the queries are not well-written SQL (which many consultants would argue is
> the majority of production queries).

I think this discrepancy is largely due to the fact that Tomas' is testing
with a cold cache (he has numbers for both), whereas most production workloads
have very high cache hit ratios. Also most production postgres workloads are
not heavily on the analytics side, in semi-transactional workloads switching
to a sequential scan "too early" is *way way* worse than staying with a index
scan for a bit longer than makes sense. The switch to a seqscan will often
make the query dramatically more expensive, whereas staying with the index
scan increases costs incrementally. Add to that the bane of fast-start plans
that really can't be disabled other than making seq scans relatively more
expensive...

I rather doubt we'll find a particularly satisfying answer to this without a
better answer to take into account how many blocks will already be cached at
the start of query execution - we only really model keeping blocks cached
across repeated accesses within one query execution. Just lowering
seq_page_cost & random_page_cost to account for IO being cheap (due to being
cached) is a pretty bad answer, as it doesn't really allow for accurate
costing of some queries having high hit ratio and others a poor one.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Ramsey 2025-10-07 20:14:27 Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Previous Message Sami Imseih 2025-10-07 19:40:02 Re: Add mode column to pg_stat_progress_vacuum