Many joins: monthly summaries S-L--O--W

From: Michael Glaesmann <grzm(at)myrealbox(dot)com>
To: pgsql-novice(at)postgresql(dot)org
Subject: Many joins: monthly summaries S-L--O--W
Date: 2003-10-21 12:09:56
Message-ID: 7C513675-03BF-11D8-B460-0005029FC1A7@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-sql

Hello all!

I've been working with PostgreSQL now for about a year and have really
enjoyed it. And I've been able to impress the boss with very simple
PHP/PostgreSQL web applications, so that's been great too!

Nearly all I've done has been very straightforward and basic: simple
selects from a couple of different tables joined by where clauses. But
now I'd like to do something a bit more complex: show sales per month,
current inventory, and expected duration of inventory at present sales
rates for each item we have. All in one relation :)

Here are the relevant relations

products (code TEXT PK, name TEXT)

orders (product_code TEXT FK products(id), date DATE, branch_id INTEGER
FK branches(id), qty INTEGER)

inventory (product_code INTEGER FK products(id), date DATE, qty INTEGER)

What I'm trying to make is a table with columns
product_code,
product_name,
jan03_qty,...,
half1_qty,
jul03_qty, ...
sep03_qty,
half2_qty,
total_qty,
inv_qty,
inv_date,
est_inv_qty,
months_remaining

where product_name is products(name), jan03_qty is the quantity of
sales in January, 2003, half1_qty is quantity of sales from January
through June, 2003, inv_qty is the latest inventory data we have,
inv_date is the date of that inventory data, est_inv_qty is the
estimated current inventory based on inv_qty and sales since inv_date,
and months_remaining is an estimate of how many months the estimated
inventory will last at average sales rates (calculated using the
average monthly sales for the previous months).

I've got something that works, but it's *extremely* slow. It takes
about 10 minutes running on a 500MHz G4 Mac with 1GB RAM, running OS
10.2.8 and PostgreSQL 7.3.3. Here's what I've done:

select
products.code as product_code,
products.name as product_name,
jan03.qty as jan03_qty,
feb03.qty as feb03_qty ...
inv.qty as inv_qty,
est_inv.qty as est_inv_qty,
months_remaining::numeric(8,1)
from products
left join (
select products.code as product_code, sum(qty) as qty
from orders, products
where products.code = orders.id and
date between '2003-01-01' and '2003-01-31'
group by product_code) as
jan03 on (jan03.product_code = products.code)
left join (
select products.code as product_code, sum(qty) as qty
from orders, products
where products.code = orders.id and
date between '2003-02-01' and '2003-02-28'
group by product_code) as
feb03 on (feb03.product_code = products.code)
left join
-- repeat through total_qty
total on (total.product_code = products.code)
left join (

-- this is where it get's hairy

select
est_inventory.product_code as product_code,
est_inventory.qty,
monthly_averages.monthly_average,
(est_inventory.qty/monthly_average)::numeric(10,1)
as months_remaining
from (
select
inventory.product_code as product_code,
inventory.qty - coalesce(orders.qty,0) as qty
from (
select product_code, date, qty
from current_inventory_view
) as
inventory
left outer join (
select orders.product_code as product_code,
sum(orders.qty) as qty
from (
select product_code, date, qty
from current_inventory_view
) as
inventory,
orders
where
orders.date > inventory.date and
orders.product_code = inventory.product_code
group by orders.product_code
) as
orders on (inventory.product_code = orders.product_code)
) as
est_inventory
left outer join (
select
product_code as product_code,
sum(qty)/ageinmonths(timestamp '9/30/2003',
timestamp '1/1/2003') as monthly_average
from orders
where date between '1/1/2003' and '9/30/2003'
group by product_code
) as
monthly_averages on
(est_inventory.product_code = monthly_averages.product_code)
where monthly_average > 0
) as
remaining on (remaining.product_code = products.code)
left join (
select distinct product_code, date, qty
from current_inventory_view order by date desc
) as
inventory on (inventory.product_code = products.code)
; -- finally

ageinmonths is an sql function that returns a double precision float,
the number of months between $1 and $2, defined as follows
select
12 * date_part ('year',age($1, $2)) +
date_part('month',age($1,$2)) +
date_part('day'), age($1, $2))/30;

current_inventory_view, showing the most recent inventory qty and date,
is defined as
SELECT inv.date, inv.product_code, inv.qty FROM inventory inv, (SELECT
max(inventory.date) AS date, inventory.product_code FROM inventory
GROUP BY inventory.product_code) curr_inv WHERE ((inv.date =
curr_inv.date) AND (inv.product_code = curr_inv.product_code));

orders is indexed on date and product_code (and branch_id, but that
shouldn't matter)
inventory is indexed on date
products is unindexed besides the order_pkey index that's
auto-generated.

I'm using left joins because not all items sell each month. Items that
have zero sales would fall out if I used WHERE clauses.

Now, I've done an EXPLAIN on the monstrosity and it comes to just over
3 pages printed at 8 pt on A3 landscape. It's much more than I can take
in. I've looked through the EXPLAIN documentation and am willing to
learn (which is good, because I definitely need to).

I tried a version of this, truncating the 'hairy' part. It definitely
ran faster, taking less than 5 minutes.

Also, this query requires more upkeep than I'd like. I modify it every
time I want to add a new month. It seems like I should be able to use a
custom function so I can just put in the date span I'd like to cover.

I've started to work on a function and have run into some problems (see
separate post: Custom function problems if you're interested )

Any suggestion would be appreciated. It seems like this should be
something that's easy, and I'm approaching it wrong. If I'm completely
on the wrong track, I'd love to know! Also, places I should/could look
for ways to accomplish this, that'd be great.

If you've made it to here, thanks for your perseverance! : )

Regards,
Michael Glaesemann
grzm myrealbox com

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message jclaudio 2003-10-21 13:37:16 how to create a multi columns return function ?
Previous Message Michael Glaesmann 2003-10-21 12:02:46 Custom function problems

Browse pgsql-sql by date

  From Date Subject
Next Message geraldo 2003-10-21 12:25:14 function problem
Previous Message Dharan 2003-10-21 07:34:24 Writing the SQL queries inside Functions and operators