Re: BUG #15001: planner cann't distinguish composite index?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, 德哥 <digoal(at)126(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15001: planner cann't distinguish composite index?
Date: 2018-01-17 21:44:16
Message-ID: 24a39222-8bd3-155c-9bd7-ee20b23da7cf@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 01/10/2018 04:35 AM, Amit Kapila wrote:
> On Tue, Jan 9, 2018 at 4:55 PM, PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference: 15001
>> Logged by: Zhou Digoal
>> Email address: digoal(at)126(dot)com
>> PostgreSQL version: 10.1
>> Operating system: centos 7.x x64
>> Description:
>>
>> ```
>> postgres=# create table tbl(c1 int, c2 int, c3 int);
>> CREATE TABLE
>> postgres=# create index idx_tbl on tbl (c1,c2);
>> CREATE INDEX
>> postgres=# insert into tbl select random()*100, random()*10 from
>> generate_series(1,10000000);
>> INSERT 0 10000000
>>
>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1;
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------------------------------------------
>> Finalize GroupAggregate (cost=176259.41..176337.18 rows=1111 width=16)
>> Group Key: c2, c1
>> -> Sort (cost=176259.41..176276.08 rows=6666 width=16)
>> Sort Key: c2, c1
>> -> Gather (cost=175158.32..175836.03 rows=6666 width=16)
>> Workers Planned: 6
>> -> Partial HashAggregate (cost=174158.32..174169.43
>> rows=1111 width=16)
>> Group Key: c2, c1
>> -> Parallel Index Only Scan using idx_tbl on tbl
>> (cost=0.43..161658.26 rows=1666675 width=8)
>> (9 rows)
>>
>> postgres=# explain select c1,c2 , count(*) from tbl group by c1,c2;
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------------------------------------
>> Finalize GroupAggregate (cost=1000.53..176040.80 rows=1111 width=16)
>> Group Key: c1, c2
>> -> Gather Merge (cost=1000.53..175979.69 rows=6666 width=16)
>> Workers Planned: 6
>> -> Partial GroupAggregate (cost=0.43..174169.43 rows=1111
>> width=16)
>> Group Key: c1, c2
>> -> Parallel Index Only Scan using idx_tbl on tbl
>> (cost=0.43..161658.26 rows=1666675 width=8)
>> (7 rows)
>> ```
>>
>> i need to set enable_sort=off, so planner can choose the same planner with
>> c1,c2 and c2,c1 group by.
>>
>> ```
>> postgres=# set enable_sort=off;
>> SET
>> postgres=# explain select c1,c2 , count(*) from tbl group by c2,c1;
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------------------------------------
>> Finalize HashAggregate (cost=175886.03..175897.14 rows=1111 width=16)
>> Group Key: c2, c1
>> -> Gather (cost=175158.32..175836.03 rows=6666 width=16)
>> Workers Planned: 6
>> -> Partial HashAggregate (cost=174158.32..174169.43 rows=1111
>> width=16)
>> Group Key: c2, c1
>> -> Parallel Index Only Scan using idx_tbl on tbl
>> (cost=0.43..161658.26 rows=1666675 width=8)
>> (7 rows)
>> ```
>>
>
> I don't think the plan chosen is same. If you notice, for c1,c2 the
> plan chosen is Finalize GroupAggregate -> Gather Merge whereas for
> c2,c1 (with sort off) it is Finalize HashAggregate -> Gather. There is
> a lot of difference between both the plans. The output by Gather
> Merge is sorted, so you can directly use GroupAggregate whereas the
> output of Gather is unsorted, so the final plan is HashAggregate.
>
> Now, here one can wonder why the planner hasn't chosen the path
> without Sort for c2,c1 even when enable_sort=on as the cost of that
> plan is less. If you see the cost difference of plans with sort
> (total_cost - 176337.18) and without sort (total_cost - 175897.14), it
> is marginal and planner thinks that they are fuzzily same. The same
> is true for startup costs as well. Now, if both have same costs
> (fuzzily), it gives preference to the sorted path.
>

The original report is quite unclear - it only shows some plans but does
not explain why it's a bug or what's the expected behavior.

The fuzzy comparison of costs certainly explains at least some of it,
but my feeling is it's likely related to the fact that group by has to
match a composite index including the order of columns.

That is, an index defined on (c1,c2) is unusable for (GROUP BY c2,c1).
Which is why GROUP BY c1,c2 uses GroupAggregate, while GROUP BY c2,c1
uses HashAggregate.

This is a known limitation, see:

[1]
https://www.postgresql.org/message-id/CAMkU%3D1w9scCPMicmhzrAkUUFVnmJ5JKok5K2KD_qNvXYyEQZ%2BA%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/3c651d99-a943-4e21-4ec9-604b142c85e0%40aegee.org

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tomas Vondra 2018-01-17 21:58:51 Re: BUG #15006: "make check" error if current user is "user"
Previous Message Tom Lane 2018-01-17 21:00:16 Re: PostgreSQL crashes with SIGSEGV