From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: multivariate statistics (v19) |
Date: | 2017-01-04 14:21:23 |
Message-ID: | CAFiTN-vjNHSEWn9M5RqZQV7KWoFT97W=Nc14YikgUxbw2qcxDg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jan 4, 2017 at 8:05 AM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> Attached is v22 of the patch series, rebased to current master and fixing
> the reported bug. I haven't made any other changes - the issues reported by
> Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
> other reviews.
v22 fixes the problem, I reported. In my test, I observed that group
by estimation is much better with ndistinct stat.
Here is one example:
postgres=# explain analyze select p_brand, p_type, p_size from part
group by p_brand, p_type, p_size;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=37992.00..38992.00 rows=100000 width=36) (actual
time=953.359..1011.302 rows=186607 loops=1)
Group Key: p_brand, p_type, p_size
-> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36)
(actual time=0.013..163.672 rows=1000000 loops=1)
Planning time: 0.194 ms
Execution time: 1020.776 ms
(5 rows)
postgres=# CREATE STATISTICS s2 WITH (ndistinct) on (p_brand, p_type,
p_size) from part;
CREATE STATISTICS
postgres=# analyze part;
ANALYZE
postgres=# explain analyze select p_brand, p_type, p_size from part
group by p_brand, p_type, p_size;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=37992.00..39622.46 rows=163046 width=36) (actual
time=935.162..992.944 rows=186607 loops=1)
Group Key: p_brand, p_type, p_size
-> Seq Scan on part (cost=0.00..30492.00 rows=1000000 width=36)
(actual time=0.013..156.746 rows=1000000 loops=1)
Planning time: 0.308 ms
Execution time: 1001.889 ms
In above example,
Without MVStat-> estimated: 100000 Actual: 186607
With MVStat-> estimated: 163046 Actual: 186607
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2017-01-04 14:32:18 | Re: pg_basebackups and slots |
Previous Message | Robert Haas | 2017-01-04 13:57:53 | Re: increasing the default WAL segment size |