Re: efficient storing of urls

From: Shane Wegner <shane-keyword-pgsql(dot)a1e0d9(at)cm(dot)nu>
To: pgsql-general(at)postgresql(dot)org
Cc: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
Subject: Re: efficient storing of urls
Date: 2004-02-28 00:00:09
Message-ID: 20040228000009.GA3231@cm.nu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Feb 27, 2004 at 06:00:36PM -0500, Sean Shanny wrote:
> Shane,
>
> Can you give an example of a query that has gotten slower due to the
> increasing size of the urls table with an explain analyze?

The database is a simple traffic monitoring tool so we have
a hits table which gets a new row for every url accessed.
Very simple table

create table hits(
hit_date date not null,
hit_time time(0) without time zone not null,
url_id int references urls(id)
);

A select to display the 100 most popular pages:
explain analyze select count(*) as c,url from hits,urls where hit_date between '2004-01-01' and '2004-01-31' and url_id=urls.id group by url order by c desc limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=320189.71..320189.96 rows=100 width=68) (actual time=34156.080..34156.324 rows=100 loops=1)
-> Sort (cost=320189.71..320700.06 rows=204138 width=68) (actual time=34156.068..34156.208 rows=100 loops=1)
Sort Key: count(*)
-> GroupAggregate (cost=281214.19..283255.57 rows=204138 width=68) (actual time=32457.857..33584.861 rows=53460 loops=1)
-> Sort (cost=281214.19..281724.54 rows=204138 width=68) (actual time=32457.690..32873.446 rows=248888 loops=1)
Sort Key: urls.url
-> Merge Join (cost=239594.05..244280.05 rows=204138 width=68) (actual time=21363.547..24385.213 rows=248888 loops=1)
Merge Cond: ("outer".url_id = "inner".id)
-> Sort (cost=168400.38..168914.15 rows=205508 width=4) (actual time=14785.934..15156.772 rows=249350 loops=1)
Sort Key: hits.url_id
-> Seq Scan on hits (cost=0.00..148512.07 rows=205508 width=4) (actual time=40.265..12081.506 rows=249350 loops=1)
Filter: ((hit_date >= '2004-01-01'::date) AND (hit_date <= '2004-01-31'::date))
-> Sort (cost=71193.67..72005.68 rows=324805 width=72) (actual time=6577.430..7422.945 rows=519307 loops=1)
Sort Key: urls.id
-> Seq Scan on urls (cost=0.00..7389.05 rows=324805 width=72) (actual time=0.110..1187.617 rows=324805 loops=1)
Total runtime: 34221.250 ms
(16 rows)

Time: 34224.959 ms

S

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Browne 2004-02-28 00:18:00 Re: efficient storing of urls
Previous Message Robert Treat 2004-02-27 23:20:17 clear defunct entries from pg_stat_activity