Re: Query plan: SELECT vs INSERT from same select

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: Alexander Voytsekhovskyy <young(dot)inbox(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Query plan: SELECT vs INSERT from same select
Date: 2019-07-24 09:24:23
Message-ID: 346DB5C5-3B26-4FA6-9B17-93FDED88A738@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> On 23 Jul 2019, at 22:29, Alexander Voytsekhovskyy <young(dot)inbox(at)gmail(dot)com> wrote:
>
> I have quite complicated query:
>
> SELECT axis_x1, axis_y1, SUM(delivery_price) as v_1 FROM (
> SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1, clients.id_client as axis_y1, delivery_data.amount * production_price.price * groups.discount as delivery_price
>
> FROM delivery_data
> JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt)
> JOIN clients ON (client_tt.id_client = clients.id_client)
> JOIN production ON (production.id = delivery_data.id_product)
> JOIN groups ON (groups.id = delivery_data.delivery_group_id AND client_tt.id_group = groups.id AND groups.id = clients.id_group)

Are client_tt.id_group and clients.id_group ever different from each other? It looks like you might have redundant information there, but... If they are guaranteed to be the same then you don’t need the JOIN to clients, which would both remove a JOIN and reduce the complexity of the JOIN condition on groups.

Or (assuming the group id’s are indeed supposed to be equal), you could
JOIN clients ON (client_tt.id_client = clients.id_client AND client_tt.id_group = clients.id_group)
instead of putting that condition within the JOIN condition on groups.

I don’t think either option will make a huge difference (the first probably more than the second, as it reduces an entire join), but it could be enough to help the database figure out a better plan.

> LEFT JOIN production_price on (delivery_data.id_product = production_price.id_production AND groups.price_list_id = production_price.price_list_id AND delivery_data.delivery_date BETWEEN production_price.date_from AND production_price.date_to)
>
> WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30'
> AND delivery_data.delivery_group_id IN (...short list of values...)
> AND delivery_data.id_product IN ()) AS tmpsource

You don’t have a price if your goods weren’t produced in the delivery window you set? Or do you have goods that get delivered without having a price?

You seem to be using this query for a report on nett sales by month, but I have my doubts whether that LEFT JOIN, and especially the condition on the production date window, is really what you want: Your formula for delivery_price includes the price column from that LEFT JOIN, so you’re going to get 0 values when there is no production_price record in your delivery-window, resulting in a SUM that’s too low if the product was produced before (or after, but that seems unlikely) the delivery window.

> WHERE TRUE

This line is unnecessary.

> GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

Apparently (I’m new to these statements), CUBE (axis_x1, axis_y1) is a shorthand for the above. They seem to have been introduced at the same time (in 9.6?). See: https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-GROUPING-SETS

> It runs well, took 1s and returns 4000 rows.

I won’t go into the performance issue ash this point, other more knowledgeable people already did.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Tignor 2019-07-24 14:38:20 postgres 9.5 DB corruption
Previous Message Imre Samu 2019-07-24 09:16:05 Re: partition table slow planning