Quick Links

Re: Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost

From:	萧鸿骏 <23031212454(at)stu(dot)xidian(dot)edu(dot)cn>
To:	"Dilip Kumar" <dilipbalaut(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost
Date:	2025-05-16 08:10:38
Message-ID:	4ea619cb.2e6.196d825321b.Coremail.23031212454@stu.xidian.edu.cn
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Thank you very much! I will consider these in my future work.

2025-05-16 14:15:48 "Dilip Kumar" <dilipbalaut(at)gmail(dot)com> 写道：
> On Thu, May 15, 2025 at 6:42 PM 萧鸿骏 <23031212454(at)stu(dot)xidian(dot)edu(dot)cn> wrote:
> >
> > Thank you very much for your reply! May I assume that we need to manually adjust the cost based on the actual situation of the system and testing computer?
> >
> > I have been conducting research on SQL performance issues in PG recently, and many of the problems I am currently studying are related to it. I would like to consult with you. If the default cost value of the optimizer is not set reliably, resulting in the selection of a poor execution plan and significant performance differences, is this considered a direction for optimizer performance optimization? I think if we stand from the user's perspective, this may be difficult to detect, leading to significant performance losses.
> >
>
> Multiple factors influence the planner parameters seq_page_cost and
> random_page_cost. PostgreSQL sets these to conservative default values
> that aim to strike a balance, not assuming either extreme. On one end,
> you might have a system where most data is on a slow spinning disk,
> while on the other, you may have a machine with large amounts of RAM
> such that almost all data is cached and disk I/O is rarely needed.
>
> These cost parameters directly influence the planner’s choice between
> access paths, for example, index scan vs sequential scan: An index
> scan involves random I/O, since it needs to jump around the heap to
> fetch matching rows by TID. A sequential scan reads the table
> linearly, which is generally faster on disk due to fewer seeks.
>
> Now, suppose your WHERE clause filters out 50% of the rows. The
> planner might estimate that an index scan would involve a high cost
> due to frequent random page reads, especially since the default
> random_page_cost is 4 times higher than seq_page_cost. As a result, it
> may choose a sequential scan as the cheaper plan.
>
> However, if most of your data is already in RAM, there is no
> meaningful difference between random and sequential page reads; both
> are fast. In such a case, the planner’s assumptions (based on default
> cost values) can lead to a suboptimal plan, not because of a bug, but
> because it's working with inaccurate cost estimates relative to your
> hardware.
>
> So while the defaults work well for many systems, if you’re noticing
> suboptimal plans, especially on machines with a lot of RAM or fast
> SSDs, it's worth tuning these parameters. For systems that mostly
> serve static data and have high cache hit ratios, reducing both
> seq_page_cost and random_page_cost (and possibly making them equal)
> may help the planner make better decisions.
>
> This is just my opinion, and others may think differently.
>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost at 2025-05-16 06:15:48 from Dilip Kumar

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tom Lane	2025-05-16 12:51:43	Re: Postgres: Queries are too slow after upgrading to PG17 from PG15
Previous Message	Dilip Kumar	2025-05-16 06:15:48	Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost