Re: Slow queries on big table

From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: "Tyrrill, Ed" <tyrrill_ed(at)emc(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow queries on big table
Date: 2007-05-18 19:36:22
Message-ID: 464E0036.3030804@g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tyrrill, Ed wrote:
> I have a two column table with over 160 million rows in it. As the size
> of the table grows queries on this table get exponentially slower. I am
> using version 8.1.5 32-bit on Red Hat Enterprise Linux 3. The hardware
> is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5
> configuration. For current testing I am running a single database
> connection with no other applications running on the machine, and the
> swap is not being used at all.
>
> Here is the table definition:
>
> mdsdb=# \d backup_location
> Table "public.backup_location"
> Column | Type | Modifiers
> -----------+---------+-----------
> record_id | bigint | not null
> backup_id | integer | not null
> Indexes:
> "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
> "backup_location_rid" btree (record_id)
> Foreign-key constraints:
> "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES
> backups(backup_id) ON DELETE CASCADE
>
> Here is the table size:
>
> mdsdb=# select count(*) from backup_location;
> count
> -----------
> 162101296
> (1 row)
>
> And here is a simple query on this table that takes nearly 20 minutes to
> return less then 3000 rows. I ran an analyze immediately before I ran
> this query:
>
> mdsdb=# explain analyze select record_id from backup_location where
> backup_id = 1070;
>
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------
> Index Scan using backup_location_pkey on backup_location
> (cost=0.00..1475268.53 rows=412394 width=8) (actual
> time=3318.057..1196723.915 rows=2752 loops=1)
> Index Cond: (backup_id = 1070)
> Total runtime: 1196725.617 ms
> (3 rows)
>
I've got a few points. Firstly, is your data amenable to partitioning?
If so that might be a big winner.
Secondly, it might be more efficient for the planner to choose the
backup_location_rid index than the combination primary key index. You
can test this theory with this cool pg trick:

begin;
alter table backup_location drop constraint backup_location_pkey;
explain analyze select ....
rollback;

to see if it's faster.

> Obviously at this point the application is not usable. If possible we
> would like to grow this table to the 3-5 billion row range, but I don't
> know if that is realistic.
>
> Any guidance would be greatly appreciated.
>

Without knowing more about your usage patterns, it's hard to say. But
partitioning seems like your best choice at the moment.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Harris 2007-05-18 19:37:00 Re: reading large BYTEA type is slower than expected
Previous Message Ron Mayer 2007-05-18 19:21:39 Re: Background vacuum