Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: James Coleman <jtc331(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "could not find pathkey item to sort" for TPC-DS queries 94-96
Date: 2021-04-15 00:21:02
Message-ID: CA+TgmoZ2m5eBCmSwams2RbmomEF_a_UvMs7ctnoBWnnMkfjZTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 14, 2021 at 5:43 PM James Coleman <jtc331(at)gmail(dot)com> wrote:
> The query in question is:
> select count(*)
> from store_sales
> ,household_demographics
> ,time_dim, store
> where ss_sold_time_sk = time_dim.t_time_sk
> and ss_hdemo_sk = household_demographics.hd_demo_sk
> and ss_store_sk = s_store_sk
> and time_dim.t_hour = 15
> and time_dim.t_minute >= 30
> and household_demographics.hd_dep_count = 7
> and store.s_store_name = 'ese'
> order by count(*)
> limit 100;
>
> From debugging output it looks like this is the plan being chosen
> (cheapest total path):
> Gather(store_sales household_demographics time_dim) rows=60626
> cost=3145.73..699910.15
> HashJoin(store_sales household_demographics time_dim)
> rows=25261 cost=2145.73..692847.55
> clauses: store_sales.ss_hdemo_sk =
> household_demographics.hd_demo_sk
> HashJoin(store_sales time_dim) rows=252609
> cost=1989.73..692028.08
> clauses: store_sales.ss_sold_time_sk =
> time_dim.t_time_sk
> SeqScan(store_sales) rows=11998564
> cost=0.00..658540.64
> SeqScan(time_dim) rows=1070
> cost=0.00..1976.35
> SeqScan(household_demographics) rows=720
> cost=0.00..147.00

This doesn't really make sense to me given the strack trace in the OP.
That seems to go Limit -> Sort -> Agg -> NestLoop -> NestLoop ->
NestLoop -> GatherMerge -> Sort. If the plan were as you have it here,
there would be no Sort and no Gather Merge, so where would be getting
a failure related to pathkeys?

I think if we can get the correct plan the thing to look at would be
the tlists at the relevant levels of the plan.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-15 00:21:58 Re: New IndexAM API controlling index vacuum strategies
Previous Message Peter Geoghegan 2021-04-15 00:20:06 Re: New IndexAM API controlling index vacuum strategies