count of occurences

From: adamcrume(at)hotmail(dot)com (Adam)
To: pgsql-general(at)postgresql(dot)org
Subject: count of occurences
Date: 2001-09-12 22:16:39
Message-ID: 5f64c126.0109121416.7f73bc3f@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I help run a job database and have a table of search records. I want
a query that will return the top 10 jobs by search frequency. I'm
familiar with ORDER BY and LIMIT, so I basically need this:

Given a table search_records:
job_num
-------
1
2
2
3
4
4
4

I want a query that will return:
job_num | count
--------+------
1 |1
2 |2
3 |1
4 |3

I tried

select distinct job_num, (select count(*) from search_records j where
j.job_num=k.job_num) from search_records k

but it is horribly slow (it takes several minutes on a table of about
25k rows!). I assume it scans the entire table for every job_num in
order to count the number of occurences of that job_num, taking order
n^2 time. Since I can easily use job_num as an index (being integers
from 0 to roughly 400 so far) I could just do a "select * from
search_records" and do the counting in PHP (our HTML pre-processor) in
order n time. However, I don't know how to do an order n*log(n) sort
in PHP, just n^2, so there would still be an efficiency problem.
I have Postgresql 7.0.3.
Help is of course greatly appreciated.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Graham Leggett 2001-09-12 22:55:30 Re: Error: Bad Timestamp Format
Previous Message Barry Lind 2001-09-12 22:07:01 Re: Error: Bad Timestamp Format

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Buttafuoco 2001-09-13 00:18:46 Re: Index location patch for review (more pgbench results)
Previous Message Rene Pijlman 2001-09-12 21:41:39 Re: Timezones and time/timestamp values in FE/BE protocol