| From: | Thomas Kellerer <spam_eater(at)gmx(dot)net> |
|---|---|
| To: | pgsql-general(at)postgresql(dot)org |
| Subject: | Using distinct in an aggregate prevents parallel execution? |
| Date: | 2018-06-06 12:41:25 |
| Message-ID: | pf8kli$i7a$1@blaine.gmane.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Consider this simplified example:
select c.id,
count(*) as total_orders,
sum(p.price) as total_value
from customer c
join orders o ON c.id = o.customer_id
join order_line ol ON o.id = ol.order_id
join product p ON ol.product_id = p.id
group by c.id;
This uses parallel execution quite nicely: https://explain.depesz.com/s/aSPNn
However, the query is incorrect as it does not count the number of orders, but (essentially) the number of order_lines.
This can easily be fixed using:
select c.id,
count(distinct o.id) as total_orders,
sum(p.price) as total_value
from customer c
join orders o ON c.id = o.customer_id
join order_line ol ON o.id = ol.order_id
join product p ON ol.product_id = p.id
group by c.id;
But in that case Postgres 10.4 decides to no longer use parallel execution: https://explain.depesz.com/s/7Ua3
Which increases the query execution time quite a bit (from 3 to 8 seconds).
Is this a known limitation?
Thomas
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Adrian Klaver | 2018-06-06 14:06:37 | Re: Which backend using which pg_temp_N schema? |
| Previous Message | Lionel Tressens | 2018-06-06 12:16:22 | Re: Setting up replication from 9.4 to 10.4 |