From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | 刘瑞 <whx20202(at)gmail(dot)com> |
Cc: | postgres performance list <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Unnecessary DISTINCT while primary key in SQL |
Date: | 2017-11-06 00:34:25 |
Message-ID: | CAKJS1f9q0j3BgMUsDbtf9=ecfVLnqvkYB44MXj0gpVuamcN8Xw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On 5 November 2017 at 04:20, 刘瑞 <whx20202(at)gmail(dot)com> wrote:
> CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text)
> INSERT into test_tbl select generate_series(1,10000000), 'test';
>
> SQL with DISTINCT:
> test=# explain analyze select distinct col, k from test_tbl order by k limit
> 1000;
> QUERY PLAN
> ------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=1277683.22..1277690.72 rows=1000 width=36) (actual
> time=12697.994..12698.382 rows=1000 loops=1)
> -> Unique (cost=1277683.22..1329170.61 rows=6864985 width=36) (actual
> time=12697.992..12698.311 rows=1000 loops=1)
> -> Sort (cost=1277683.22..1294845.68 rows=6864985 width=36)
> (actual time=12697.991..12698.107 rows=1000 loops=1)
> Sort Key: k, col
> Sort Method: external sort Disk: 215064kB
> -> Seq Scan on test_tbl (cost=0.00..122704.85 rows=6864985
> width=36) (actual time=0.809..7561.215 rows=10000000 loops=1)
> Planning time: 2.368 ms
> Execution time: 12728.471 ms
> (8 rows)
The current planner does not make much of an effort into recording
which columns remain distinct at each level. I have ideas on how to
improve this and it would include improving your case here.
9.6 did improve a slight variation of your query, but this was for
GROUP BY instead of DISTINCT. Probably there's no reason why the same
optimisation could not be applied to DISTINCT, I just didn't think of
it when writing the patch.
The item from the release notes [1] reads "Ignore GROUP BY columns
that are functionally dependent on other columns"
So, if you were to write the query as:
explain analyze select col, k from test_tbl group by col, k order by k
limit 1000;
It should run much more quickly, although still not as optimal as it could be.
[1] https://www.postgresql.org/docs/9.6/static/release-9-6.html
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Adam Torres | 2017-11-06 13:18:00 | Performance loss upgrading from 9.3 to 9.6 |
Previous Message | 刘瑞 | 2017-11-04 15:20:42 | Unnecessary DISTINCT while primary key in SQL |