Re: [HACKERS] Removing useless DISTINCT clauses

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Removing useless DISTINCT clauses
Date: 2018-01-05 21:34:40
Message-ID: CAMkU=1xg6_jX7dXdN68fx=choBKC9Ds+WM9mRSQu+TsHsy0mLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 6, 2017 at 1:16 AM, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
wrote:

> In [1] we made a change to process the GROUP BY clause to remove any
> group by items that are functionally dependent on some other GROUP BY
> items.
>
> This really just checks if a table's PK columns are entirely present
> in the GROUP BY clause and removes anything else belonging to that
> table.
>
> All this seems to work well, but I totally failed to consider that the
> exact same thing applies to DISTINCT too.
>
> Over in [2], Rui Liu mentions that the planner could do a better job
> for his case.
>
> Using Rui Liu's example:
>
> CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text);
> INSERT into test_tbl select generate_series(1,10000000), 'test';
>
> Master:
>
> postgres=# explain analyze verbose select distinct col, k from
> test_tbl order by k limit 1000;
> QUERY
> PLAN
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------------
> Limit (cost=1658556.19..1658563.69 rows=1000 width=9) (actual
> time=8934.962..8935.495 rows=1000 loops=1)
> Output: col, k
> -> Unique (cost=1658556.19..1733557.50 rows=10000175 width=9)
> (actual time=8934.961..8935.460 rows=1000 loops=1)
> Output: col, k
> -> Sort (cost=1658556.19..1683556.63 rows=10000175 width=9)
> (actual time=8934.959..8935.149 rows=1000 loops=1)
> Output: col, k
> Sort Key: test_tbl.k, test_tbl.col
> Sort Method: external merge Disk: 215128kB
> -> Seq Scan on public.test_tbl (cost=0.00..154056.75
> rows=10000175 width=9) (actual time=0.062..1901.728 rows=10000000
> loops=1)
> Output: col, k
> Planning time: 0.092 ms
> Execution time: 8958.687 ms
> (12 rows)
>
> Patched:
>
> postgres=# explain analyze verbose select distinct col, k from
> test_tbl order by k limit 1000;
>
> QUERY PLAN
> ------------------------------------------------------------
> ------------------------------------------------------------
> ----------------------------------
> Limit (cost=0.44..34.31 rows=1000 width=9) (actual time=0.030..0.895
> rows=1000 loops=1)
> Output: col, k
> -> Unique (cost=0.44..338745.50 rows=10000175 width=9) (actual
> time=0.029..0.814 rows=1000 loops=1)
> Output: col, k
> -> Index Scan using test_tbl_pkey on public.test_tbl
> (cost=0.44..313745.06 rows=10000175 width=9) (actual time=0.026..0.452
> rows=1000 loops=1)
> Output: col, k
> Planning time: 0.152 ms
> Execution time: 0.985 ms
> (8 rows)
>
> A patch to implement this is attached.
>
>
Couldn't the Unique node be removed entirely? If k is a primary key, you
can't have duplicates in need of removal.

Or would that be a subject for a different patch?

I think remove_functionally_dependant_groupclauses should have a more
generic name, like remove_functionally_dependant_clauses.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-01-05 21:57:33 Re: [HACKERS] Proposal: Local indexes for partitioned table
Previous Message Peter Eisentraut 2018-01-05 21:30:33 Re: [HACKERS] Transaction control in procedures