Skip site navigation (1) Skip section navigation (2)

Re: efficient storing of urls

From: Shane Wegner <shane-keyword-pgsql(dot)a1e0d9(at)cm(dot)nu>
To: pgsql-general(at)postgresql(dot)org
Cc: Sean Shanny <shannyconsulting(at)earthlink(dot)net>
Subject: Re: efficient storing of urls
Date: 2004-02-28 00:00:09
Message-ID: 20040228000009.GA3231@cm.nu (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-general
On Fri, Feb 27, 2004 at 06:00:36PM -0500, Sean Shanny wrote:
> Shane,
> 
> Can you give an example of a query that has gotten slower due to the 
> increasing size of the urls table with an explain analyze?

The database is a simple traffic monitoring tool so we have
a hits table which gets a new row for every url accessed. 
Very simple table

create table hits(
hit_date date not null,
hit_time time(0) without time zone not null,
url_id int references urls(id)
);

A select to display the 100 most popular pages:
explain analyze select count(*) as c,url from hits,urls where hit_date between '2004-01-01' and '2004-01-31' and url_id=urls.id group by url order by c desc limit 100;
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=320189.71..320189.96 rows=100 width=68) (actual time=34156.080..34156.324 rows=100 loops=1)
   ->  Sort  (cost=320189.71..320700.06 rows=204138 width=68) (actual time=34156.068..34156.208 rows=100 loops=1)
         Sort Key: count(*)
         ->  GroupAggregate  (cost=281214.19..283255.57 rows=204138 width=68) (actual time=32457.857..33584.861 rows=53460 loops=1)
               ->  Sort  (cost=281214.19..281724.54 rows=204138 width=68) (actual time=32457.690..32873.446 rows=248888 loops=1)
                     Sort Key: urls.url
                     ->  Merge Join  (cost=239594.05..244280.05 rows=204138 width=68) (actual time=21363.547..24385.213 rows=248888 loops=1)
                           Merge Cond: ("outer".url_id = "inner".id)
                           ->  Sort  (cost=168400.38..168914.15 rows=205508 width=4) (actual time=14785.934..15156.772 rows=249350 loops=1)
                                 Sort Key: hits.url_id
                                 ->  Seq Scan on hits  (cost=0.00..148512.07 rows=205508 width=4) (actual time=40.265..12081.506 rows=249350 loops=1)
                                       Filter: ((hit_date >= '2004-01-01'::date) AND (hit_date <= '2004-01-31'::date))
                           ->  Sort  (cost=71193.67..72005.68 rows=324805 width=72) (actual time=6577.430..7422.945 rows=519307 loops=1)
                                 Sort Key: urls.id
                                 ->  Seq Scan on urls  (cost=0.00..7389.05 rows=324805 width=72) (actual time=0.110..1187.617 rows=324805 loops=1)
 Total runtime: 34221.250 ms
(16 rows)

Time: 34224.959 ms

S

In response to

Responses

pgsql-general by date

Next:From: Chris BrowneDate: 2004-02-28 00:18:00
Subject: Re: efficient storing of urls
Previous:From: Robert TreatDate: 2004-02-27 23:20:17
Subject: clear defunct entries from pg_stat_activity

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group