| From: | Jeff Davis <jdavis-pgsql(at)empires(dot)org> |
|---|---|
| To: | pgsql-general(at)postgresql(dot)org |
| Subject: | query optimization: aggregate and distinct |
| Date: | 2003-08-20 23:26:26 |
| Message-ID: | 200308201626.26310.jdavis-pgsql@empires.org |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
I have below a simplified version of what I'm trying to do. Basically, I am
trying to get both an aggregate (an average) and "most recent" value.
g | v | ts
---+----+----------------------------
1 | 10 | 2003-08-20 16:00:27.010769
1 | 20 | 2003-08-20 16:00:30.380476
2 | 40 | 2003-08-20 16:00:37.399717
2 | 80 | 2003-08-20 16:00:40.265717
I would like, as output, something like this:
g | v | avg | ts
---+----+--------------------+----------------------------
1 | 20 | 15.000000000000000 | 2003-08-20 16:00:30.380476
2 | 80 | 60.000000000000000 | 2003-08-20 16:00:40.265717
which I got by a query like:
SELECT
t2.g,t2.v,t1.avg,t2.ts
FROM
(SELECT
g,avg(v)
FROM t
GROUP BY g
) t1,
(SELECT
DISTINCT ON (g)
* FROM t
ORDER BY g,ts DESC
) t2
WHERE t1.g = t2.g;
That produces the results that I need, but it seems inefficient to join a
table with itself like that. My real query (not this simplified example)
takes 5+ seconds and I suspect this join is why.
Is there a better way?
For my real query, it's using index scans where I'd expect, and I frequently
VACUUM ANALYZE the big table and I have all the stats turned on. Also, I have
more shared buffers than needed to put everything in RAM.
Right now I'm using 7.2.1. Any improvements in 7.3 or 7.4 that would help this
issue?
Regards,
Jeff Davis
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Greg Stark | 2003-08-21 00:19:35 | Re: Collation rules and multi-lingual databases |
| Previous Message | Josh Rovero | 2003-08-20 23:15:40 | Re: 7.4b1 vs 7.3.4 performance |