Quick Links

Using distinct in an aggregate prevents parallel execution?

From:	Thomas Kellerer <spam_eater(at)gmx(dot)net>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Using distinct in an aggregate prevents parallel execution?
Date:	2018-06-06 12:41:25
Message-ID:	pf8kli$i7a$1@blaine.gmane.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Consider this simplified example:

select c.id,
count(*) as total_orders,
sum(p.price) as total_value
from customer c
join orders o ON c.id = o.customer_id
join order_line ol ON o.id = ol.order_id
join product p ON ol.product_id = p.id
group by c.id;

This uses parallel execution quite nicely: https://explain.depesz.com/s/aSPNn

However, the query is incorrect as it does not count the number of orders, but (essentially) the number of order_lines.

This can easily be fixed using:

select c.id,
count(distinct o.id) as total_orders,
sum(p.price) as total_value
from customer c
join orders o ON c.id = o.customer_id
join order_line ol ON o.id = ol.order_id
join product p ON ol.product_id = p.id
group by c.id;

But in that case Postgres 10.4 decides to no longer use parallel execution: https://explain.depesz.com/s/7Ua3

Which increases the query execution time quite a bit (from 3 to 8 seconds).

Is this a known limitation?

Thomas

Responses

Re: Using distinct in an aggregate prevents parallel execution? at 2018-06-06 14:32:27 from Tom Lane

Browse pgsql-general by date

	From	Date	Subject
Next Message	Adrian Klaver	2018-06-06 14:06:37	Re: Which backend using which pg_temp_N schema?
Previous Message	Lionel Tressens	2018-06-06 12:16:22	Re: Setting up replication from 9.4 to 10.4