Quick Links

Re: Slow queries on big table

From:	Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To:	"Tyrrill, Ed" <tyrrill_ed(at)emc(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow queries on big table
Date:	2007-05-18 19:36:22
Message-ID:	464E0036.3030804@g2switchworks.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Tyrrill, Ed wrote:
> I have a two column table with over 160 million rows in it. As the size
> of the table grows queries on this table get exponentially slower. I am
> using version 8.1.5 32-bit on Red Hat Enterprise Linux 3. The hardware
> is an Intel 3 Ghz Xeon with 4GB RAM, and 6 disks in a RAID 5
> configuration. For current testing I am running a single database
> connection with no other applications running on the machine, and the
> swap is not being used at all.
>
> Here is the table definition:
>
> mdsdb=# \d backup_location
> Table "public.backup_location"
> Column | Type | Modifiers
> -----------+---------+-----------
> record_id | bigint | not null
> backup_id | integer | not null
> Indexes:
> "backup_location_pkey" PRIMARY KEY, btree (record_id, backup_id)
> "backup_location_rid" btree (record_id)
> Foreign-key constraints:
> "backup_location_bfk" FOREIGN KEY (backup_id) REFERENCES
> backups(backup_id) ON DELETE CASCADE
>
> Here is the table size:
>
> mdsdb=# select count(*) from backup_location;
> count
> -----------
> 162101296
> (1 row)
>
> And here is a simple query on this table that takes nearly 20 minutes to
> return less then 3000 rows. I ran an analyze immediately before I ran
> this query:
>
> mdsdb=# explain analyze select record_id from backup_location where
> backup_id = 1070;
>
> QUERY PLAN
>
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> -------------
> Index Scan using backup_location_pkey on backup_location
> (cost=0.00..1475268.53 rows=412394 width=8) (actual
> time=3318.057..1196723.915 rows=2752 loops=1)
> Index Cond: (backup_id = 1070)
> Total runtime: 1196725.617 ms
> (3 rows)
>
I've got a few points. Firstly, is your data amenable to partitioning?
If so that might be a big winner.
Secondly, it might be more efficient for the planner to choose the
backup_location_rid index than the combination primary key index. You
can test this theory with this cool pg trick:

begin;
alter table backup_location drop constraint backup_location_pkey;
explain analyze select ....
rollback;

to see if it's faster.

> Obviously at this point the application is not usable. If possible we
> would like to grow this table to the 3-5 billion row range, but I don't
> know if that is realistic.
>
> Any guidance would be greatly appreciated.
>

Without knowing more about your usage patterns, it's hard to say. But
partitioning seems like your best choice at the moment.

In response to

Slow queries on big table at 2007-05-18 18:30:12 from Tyrrill, Ed

Responses

Re: Slow queries on big table at 2007-05-18 20:06:09 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Mark Harris	2007-05-18 19:37:00	Re: reading large BYTEA type is slower than expected
Previous Message	Ron Mayer	2007-05-18 19:21:39	Re: Background vacuum