Skip site navigation (1) Skip section navigation (2)

Re: Slow queries on big table

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: "Tyrrill, Ed" <tyrrill_ed(at)emc(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow queries on big table
Date: 2007-05-18 19:36:22
Message-ID: 464E0036.3030804@g2switchworks.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tyrrill, Ed wrote:
> I have a two column table with over 160 million rows in it.  As the size
> of the table grows queries on this table get exponentially slower.  I am
> using version 8.1.5 32-bit on Red Hat Enterprise Linux 3.  The hardware
> is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5
> configuration.  For current testing I am running a single database
> connection with no other applications running on the machine, and the
> swap is not being used at all.
>
> Here is the table definition:
>
> mdsdb=# \d backup_location
>  Table "public.backup_location"
>   Column   |  Type   | Modifiers
> -----------+---------+-----------
>  record_id | bigint  | not null
>  backup_id | integer | not null
> Indexes:
>     "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
>     "backup_location_rid" btree (record_id)
> Foreign-key constraints:
>     "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES
> backups(backup_id) ON DELETE CASCADE
>  
> Here is the table size:
>
> mdsdb=# select count(*) from backup_location;
>    count
> -----------
>  162101296
> (1 row)
>
> And here is a simple query on this table that takes nearly 20 minutes to
> return less then 3000 rows.  I ran an analyze immediately before I ran
> this query:
>
> mdsdb=# explain analyze select record_id from backup_location where
> backup_id = 1070;
>  
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------
>  Index Scan using backup_location_pkey on backup_location
> (cost=0.00..1475268.53 rows=412394 width=8) (actual
> time=3318.057..1196723.915 rows=2752 loops=1)
>    Index Cond: (backup_id = 1070)
>  Total runtime: 1196725.617 ms
> (3 rows)
>   
I've got a few points.  Firstly, is your data amenable to partitioning?  
If so that might be a big winner.
Secondly, it might be more efficient for the planner to choose the 
backup_location_rid index than the combination primary key index.  You 
can test this theory with this cool pg trick:

begin;
alter table backup_location drop constraint backup_location_pkey;
explain analyze select ....
rollback;

to see if it's faster.

> Obviously at this point the application is not usable.  If possible we
> would like to grow this table to the 3-5 billion row range, but I don't
> know if that is realistic.
>
> Any guidance would be greatly appreciated.
>   

Without knowing more about your usage patterns, it's hard to say.  But 
partitioning seems like your best choice at the moment.

In response to

Responses

pgsql-performance by date

Next:From: Mark HarrisDate: 2007-05-18 19:37:00
Subject: Re: reading large BYTEA type is slower than expected
Previous:From: Ron MayerDate: 2007-05-18 19:21:39
Subject: Re: Background vacuum

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group