Re: extracting min date and grouping

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Keith Worthington <keithw(at)narrowpathinc(dot)com>
Cc: PostgreSQL Novice <pgsql-novice(at)postgresql(dot)org>
Subject: Re: extracting min date and grouping
Date: 2004-12-22 06:52:40
Message-ID: 20041222065240.GA95986@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Tue, Dec 21, 2004 at 10:37:56PM -0500, Keith Worthington wrote:

> scan_timestamp | item_id | quantity | employee_id | void
> ---------------------+---------+----------+-------------+------
> 2004-12-20 16:09:47 | SEB12 | 555 | 116 | f
> 2004-12-20 16:10:03 | B346.0 | 555 | 116 | f
> 2004-12-20 16:10:11 | B346.5 | 888 | 116 | f
> 2004-12-20 16:09:33 | SAC38 | 66 | 116 | f
> 2004-12-19 09:05:29 | SNAP50 | 2255 | 116 | f
> 2004-12-19 09:05:39 | RSN2222 | 525 | 116 | f
> 2004-12-19 09:05:49 | SAC38 | 658 | 116 | f

It would be helpful to see the SQL statements used to create and
populate the sample table -- that way readers can paste them into
their database and experiment with the queries you're after.

> I can get the min date but isn't there an easier/faster way?
> IPADB=# SELECT CAST(extract(year from min(scan_timestamp)) || extract(month
> from min(scan_timestamp)) || extract(day from min(scan_timestamp)) AS date) AS
> inv_date FROM inventory.tbl_scanner;

SELECT MIN(scan_timestamp)::DATE AS inv_date FROM inventory.tbl_scanner;

If the table is large then you could speed up the operation by
creating an index on DATE(scan_timestamp) and using ORDER BY and
LIMIT (a PostgreSQL trick to speed up certain aggregate queries):

CREATE INDEX tbl_scanner_date_idx
ON inventory.tbl_scanner (DATE(scan_timestamp));

SELECT DATE(scan_timestamp) AS inv_date
FROM inventory.tbl_scanner
ORDER BY DATE(scan_timestamp) LIMIT 1;

If you use EXPLAIN ANALYZE on the various queries, you should see
that the last one uses the index while the others do a sequential
scan.

> The group and the sum is straightforward but I get this
> IPADB=# SELECT CAST(extract(year from min(scan_timestamp)) || extract(month
> from min(scan_timestamp)) || extract(day from min(scan_timestamp)) AS date) AS
> inventory_date, item_id, sum(quantity) as total FROM inventory.tbl_scanner
> GROUP BY item_id;
> inventory_date | item_id | total
> ----------------+---------+-------
> 2004-12-20 | B346.0 | 555
> 2004-12-20 | B346.5 | 888
> 2004-12-19 | RSN2222 | 525
> 2004-12-19 | SAC38 | 724
> 2004-12-20 | SEB12 | 555
> 2004-12-19 | SNAP50 | 2255
> (6 rows)
>
> When what I really want is this.
> inventory_date | item_id | total
> ----------------+---------+-------
> 2004-12-19 | B346.0 | 555
> 2004-12-19 | B346.5 | 888
> 2004-12-19 | RSN2222 | 525
> 2004-12-19 | SAC38 | 724
> 2004-12-19 | SEB12 | 555
> 2004-12-19 | SNAP50 | 2255

Here's one way to get that result for your example data set -- it
gets the table's minimum date in a subselect and cross joins it to
the table's records:

SELECT inventory_date, item_id, SUM(quantity) AS total
FROM (SELECT MIN(scan_timestamp)::DATE AS inventory_date
FROM inventory.tbl_scanner) AS s
CROSS JOIN inventory.tbl_scanner
GROUP BY inventory_date, item_id
ORDER BY item_id;

Why do you want all records to have an inventory_date of 2004-12-19?
Because that's the beginning of an inventory period? If so, how
is that beginning date determined, i.e., how do you know when one
period ends and another begins? If we knew more about what you're
trying to do then maybe we could suggest improvements.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Michael Fuhr 2004-12-22 07:07:03 Re: Problem with ALIAS?
Previous Message Keith Worthington 2004-12-22 05:54:57 Problem with ALIAS?