Re: The nested view from hell - Restricting a subquerry

From: Nis Jørgensen <nis(at)superlativ(dot)dk>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: The nested view from hell - Restricting a subquerry
Date: 2007-07-22 09:06:45
Message-ID: f7v6na$2gc$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Bryce Nesbitt skrev:
> I've got a legacy app with a hefty performance problem. The basic
> problem is stupid design. It takes 10-15 seconds of CPU time to look up
> an invoice.
> Basically it's trying to mash up extra columns on an otherwise simple
> query, and those extra columns are subtotals. Simplified (this looks
> best in a fixed width font):
>
> SELECT max(order_view.order_id),max(order_view.invoice_id)
> ,sum(order_view.mileage)
> FROM (SELECT order_id,invoice_id, 0 as miles FROM eg_order
> UNION
> SELECT order_id,0 , miles FROM eg_order_line)
> order_view GROUP BY order_view.order_id;
>
> A select by order_id is fast. The problem is the application uses
> "select * from view where invoice_id=x", and the second part of the
> UNION returns all possible rows in the database. These get filtered out
> later, but at considerable performance hit.

Just for the record, I believe your simplified example should look like
this (changed "max(order_id)" to "order_id" in outer select , changed
"miles" to "mileage"):

SELECT order_id,
max(order_view.invoice_id),
sum(order_view.mileage)
FROM (SELECT order_id,invoice_id, 0 as mileage FROM eg_order
UNION
SELECT order_id, 0, mileage FROM eg_order_line)
order_view GROUP BY order_view.order_id;

It is pretty clear that the problem comes from joining on the result of
an aggregate. PG apparently is not smart enough to recognize that the
result of a max must be one of the values of the column (meaning that it
can use an index)

It is not clear whether there is a FK relation between eg_order and
eg_order_line and what the PK of eg_order is. If there is a FK, you can
do something along the lines of

SELECT order_id,
invoice_id
COALESCE(sum(mileage),0) as mileage
FROM eg_order LEFT JOIN eg_order_line USING order_id
GROUP BY order_id, invoice_id

If there can be more than one invoice_id per order_id, you might need to add

HAVING invoice_id = (SELECT max(invoice_id) FROM eg_order eo2 WHERE
eg_order.order_id = eo2.order_id)

or similar.

Hope this helps,

Nis

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2007-07-22 18:05:07 Re: The nested view from hell - Restricting a subquerry
Previous Message Bryce Nesbitt 2007-07-22 03:10:55 The nested view from hell - Restricting a subquerry