| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Chengpeng Yan <chengpeng_yan(at)Outlook(dot)com> |
| Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, John Naylor <johncnaylorls(at)gmail(dot)com> |
| Subject: | Re: Add a greedy join search algorithm to handle large join problems |
| Date: | 2025-12-09 23:30:47 |
| Message-ID: | cb313155-24c4-4838-a46b-44968993a6e2@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 12/9/25 20:20, Tomas Vondra wrote:
> On 12/2/25 14:04, Chengpeng Yan wrote:
>> Hi,
>>
>>
>>
>>> On Dec 2, 2025, at 18:56, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>>
>>> I think a much broader evaluation will be needed, comparing not just the
>>> planning time, but also the quality of the final plan. Which for the
>>> starjoin tests does not really matter, as the plans are all equal in
>>> this regard.
>>
>>
>> Many thanks for your feedback.
>>
>> You are absolutely right — plan quality is also very important. In my
>> initial email I only showed the improvements in planning time, but did
>> not provide results regarding plan quality. I will run tests on more
>> complex join scenarios, evaluating both planning time and plan quality.
>>
>
> I was trying to do some simple experiments by comparing plans for TPC-DS
> queries, but unfortunately I get a lot of crashes with the patch. All
> the backtraces look very similar - see the attached example. The root
> cause seems to be that sort_inner_and_outer() sees
>
> inner_path = NULL
>
> I haven't investigated this very much, but I suppose the GOO code should
> be calling set_cheapest() from somewhere.
>
FWIW after looking at the failing queries for a bit, and a bit of
tweaking, it seems the issue is about aggregates in the select list. For
example this TPC-DS query fails (Q7):
select i_item_id,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
from store_sales, customer_demographics, date_dim, item, promotion
where ss_sold_date_sk = d_date_sk and
ss_item_sk = i_item_sk and
ss_cdemo_sk = cd_demo_sk and
ss_promo_sk = p_promo_sk and
cd_gender = 'F' and
cd_marital_status = 'W' and
cd_education_status = 'Primary' and
(p_channel_email = 'N' or p_channel_event = 'N') and
d_year = 1998
group by i_item_id
order by i_item_id
LIMIT 100;
but if I remove the aggregates, it plans just fine:
select i_item_id
from store_sales, customer_demographics, date_dim, item, promotion
where ss_sold_date_sk = d_date_sk and
ss_item_sk = i_item_sk and
ss_cdemo_sk = cd_demo_sk and
ss_promo_sk = p_promo_sk and
cd_gender = 'F' and
cd_marital_status = 'W' and
cd_education_status = 'Primary' and
(p_channel_email = 'N' or p_channel_event = 'N') and
d_year = 1998
group by i_item_id
order by i_item_id
LIMIT 100;
The backtrace matches the one I already posted, I'm not going to post
that again.
I looked at a couple more failing queries, and removing the aggregates
fixes them too. Maybe there are other issues/crashes, of course.
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Euler Taveira | 2025-12-09 23:31:56 | Re: Add support for specifying tables in pg_createsubscriber. |
| Previous Message | Mark Wong | 2025-12-09 23:28:59 | updates for handling optional argument in system functions |