selects from large tables

From: Nikk Anderson <Nikk(dot)Anderson(at)parallel(dot)ltd(dot)uk>
To: pgsql-performance(at)postgresql(dot)org
Subject: selects from large tables
Date: 2002-11-18 12:32:45
Message-ID: DA1274E682D3734B8802904A9B36124C298A93@nic-nts1.nic.parallel.ltd.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi All,

We are using Postgres 7.1, on Solaris 8 - hardware is a 400mhz Netra X1,
512Mb ram, with the database on a separate partition.

Our main result tables are getting really big, and we don't want to delete
any data yet. Currently, our largest table has around 10 million rows and
is going up at a rate of around 1 million per month. The table has 13
integer, one boolean and one timestamp column. We index the table on an ID
number and the timestamp. We vacuum analyse the table every night. The
performance has steadily degraded, and the more data we try and select, the
longer the select queries take.
The queries are not complex, and do not involve any unions etc, eg:

SELECT * FROM table_name WHERE column1 = 454 AND time BETWEEN '2002-10-13
13:44:00.0' AND '2002-11-14'

SELECT count(DISTINCT id) FROM table_name WHERE column1 = 454 AND time
BETWEEN '2002-10-13 13:44:00.0' AND '2002-11-14

See various queries and explains at the end this email for more info on the
type of queries we are doing.
Most of the queries use a sequence scan - disabling this and forcing index
scan decreases performance further for those queries.

These queries are sometimes taking over 2 minutes to perform!!!! If we
reduce the table size significantly (i.e. around 1 million rows)is is
obviously faster - down to a few seconds.

We then tried the DB on a clean installation of Solaris 9, on a dual 400mhz
processor SunE250 with 2Gb ram, and 2 scsi 17gb disks. We put the database
onto the second disk. Surprisingly the performance is only 5-10% greater.
I expected far more, due to the increased power of the machine. Looking at
the os info on this machine, the IO wait is negligible as is the cpu usage.
So this machine is not working as hard as the Netra X1, though the time
taken to perform queries is not too much different.

We have tried tweaking the shared buffers and sort mem (also tweaking kernel
shared mem size), which make little difference, and in fact if we increase
it to around 25% of total memory performance degrades slightly. We have
changed from the default amount of shared buffers, to 64000 to give us
access to 25% of the total system memory.

Any ideas on how we can select data more quickly from large tables?

Other ideas we had was to split the data over multiple table by id
(resulting in several thousand tables), however this would make management
of the database in terms of keys, triggers and integrity very difficult and
messy.

I hope someone can offer some advice.

Cheers

Nikk

- Queries and explain plans

select count(*) from table_name;
NOTICE: QUERY PLAN:
Aggregate (cost=488700.65..488700.65 rows=1 width=0)
-> Seq Scan on table_name (cost=0.00..439527.12 rows=19669412 width=0)

hawkdb=# explain select count(job_id) from table_name;
NOTICE: QUERY PLAN:
Aggregate (cost=488700.65..488700.65 rows=1 width=4)
-> Seq Scan on table_name (cost=0.00..439527.12 rows=19669412 width=4)

hawkdb=# explain select * from table_name;
NOTICE: QUERY PLAN:
Seq Scan on table_name (cost=0.00..439527.12 rows=19669412 width=57)

hawkdb=# explain select count(*) from table_name where job_id = 13;
NOTICE: QUERY PLAN:
Aggregate (cost=537874.18..537874.18 rows=1 width=0)
-> Seq Scan on table_name (cost=0.00..488700.65 rows=19669412 width=0)

hawkdb=# explain select * from table_name where job_id = 13;
NOTICE: QUERY PLAN:
Seq Scan on http_result (cost=0.00..488700.65 rows=19669412 width=57)

hawkdb=# explain select * from table_name where job_id = 1;
NOTICE: QUERY PLAN:
Index Scan using http_result_pk on table_name (cost=0.00..5.01 rows=1
width=57)

hawkdb=#explain select * from table_name where time > '2002-10-10';
NOTICE: QUERY PLAN:
Seq Scan on table_name (cost=0.00..488700.65 rows=19649743 width=57)

hawkdb=# explain select * from http_result where time < '2002-10-10';
NOTICE: QUERY PLAN:
Index Scan using table_name_time on table_name (cost=0.00..75879.17
rows=19669 width=57)

Nikk Anderson

Parallel ltd.
Cranfield Innovation Centre
University Way
Cranfield
Bedfordshire
MK43 0BT

http://www.nexuswatch.com
http://www.parallel.ltd.uk

Tel: +44 (0)8700 PARALLEL (727255)
Fax: +44 (0)8700 PARAFAX (727232)

******************************************************************
Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message
(or responsible for delivery of the message to such person), you
may not copy or deliver this message to anyone. In such case, you
should destroy this message and kindly notify the sender by reply
email. Please advise immediately if you or your employer do not
consent to Internet email for messages of this kind. Opinions,
conclusions and other information in this message that do not
relate to the official business of Parallel shall be understood
as neither given nor endorsed by it.

Unless agreed otherwise by way of a signed agreement, any business
conducted by Parallel shall be subject to its Standard Terms
and Conditions which are available upon request.
******************************************************************

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephan Szabo 2002-11-18 13:02:03 Re: selects from large tables
Previous Message Stephan Szabo 2002-11-18 11:56:50 Re: Sort time