Quick Links

Request for help with slow query

From:	"Woolcock, Sean" <Sean(dot)Woolcock(at)emc(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Request for help with slow query
Date:	2012-10-29 17:41:23
Message-ID:	998490E92A198A48BE60F3A9C44CA3F1010A7DCF3901@MX40A.corp.emc.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi, thanks for any help. I've tried to be thorough, but let me know if I should
provide more information.

A description of what you are trying to achieve and what results you expect:
I have a large (3 million row) table called "tape" that represents files,
which I join to a small (100 row) table called "filesystem" that represents
filesystems. I have a web interface that allows you to sort by a number of
fields in the tape table and view the results 100 at a time (using LIMIT
and OFFSET).

The data only changes hourly and I do a "vacuum analyze" after all changes.

The tables are defined as:

create table filesystem (
id serial primary key,
host varchar(256),
storage_path varchar(2048) not null check (storage_path != ''),
mounted_on varchar(2048) not null check (mounted_on != ''),
constraint unique_fs unique(host, storage_path)
);
create table tape (
id serial primary key,
volser char(255) not null check (volser != ''),
path varchar(2048) not null check (path != ''),
scratched boolean not null default FALSE,
last_write_date timestamp not null default current_timestamp,
last_access_date timestamp not null default current_timestamp,
filesystem_id integer references filesystem not null,
size bigint not null check (size >= 0),
worm_status char,
encryption char,
job_name char(8),
job_step char(8),
dsname char(17),
recfm char(3),
block_size int,
lrecl int,
constraint filesystem_already_has_that_volser unique(filesystem_id, volser)
);

An example query that's running slowly for me is:

select tape.volser,
tape.path,
tape.scratched,
tape.size,
extract(epoch from tape.last_write_date) as last_write_date,
extract(epoch from tape.last_access_date) as last_access_date
from tape
inner join filesystem
on (tape.filesystem_id = filesystem.id)
order by last_write_date desc
limit 100
offset 100;

On Postgres 8.1.17 this takes about 60 seconds. I would like it to be faster.

Here's the explain output:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=3226201.13..3226201.38 rows=100 width=308) (actual time=66311.929..66312.053 rows=100 loops=1)
-> Sort (cost=3226200.88..3234250.28 rows=3219757 width=308) (actual time=66311.826..66311.965 rows=200 loops=1)
Sort Key: date_part('epoch'::text, tape.last_write_date)
-> Hash Join (cost=3.26..242948.97 rows=3219757 width=308) (actual time=3.165..31680.830 rows=3219757 loops=1)
Hash Cond: ("outer".filesystem_id = "inner".id)
-> Seq Scan on tape (cost=0.00..178550.57 rows=3219757 width=312) (actual time=2.824..18175.863 rows=3219757 loops=1)
-> Hash (cost=3.01..3.01 rows=101 width=4) (actual time=0.204..0.204 rows=101 loops=1)
-> Seq Scan on filesystem (cost=0.00..3.01 rows=101 width=4) (actual time=0.004..0.116 rows=101 loops=1)
Total runtime: 66553.643 ms

Here's a depesz link with that output: http://explain.depesz.com/s/AUR

Things I've tried:

1. I added an index on last_write_date with:

create index tape_last_write_date_idx on tape(last_write_date);

and there was no improvement in query time.

2. I bumped:
effective_cache_size to 1/2 system RAM (1GB)
shared_buffers to 1/4 system RAM (512MB)
work_mem to 10MB
and there was no improvement in query time.

3. I ran the query against the same data in Postgres 9.1.6 rather than 8.1.17
using the same hardware and it was about 5 times faster (nice work,
whoever did that!). Unfortunately upgrading is not an option, so this
is more of an anecdote. I would think the query could go much faster
in either environment with some optimization.

The EXACT PostgreSQL version you are running:
PostgreSQL 8.1.17 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070115 (SUSE Linux)

How you installed PostgreSQL:
Standard SuSE SLES 10-SP3 RPMs:
postgresql-devel-8.1.17-0.3
postgresql-pl-8.1.17-0.4
postgresql-libs-8.1.17-0.3
postgresql-8.1.17-0.3
postgresql-server-8.1.17-0.3
postgresql-contrib-8.1.17-0.3

Changes made to the settings in the postgresql.conf file:
Only the memory changes mentioned above.

Operating system and version:
Linux acp1 2.6.16.60-0.54.5-default #1 Fri Sep 4 01:28:03 UTC 2009 i686 i686 i386 GNU/Linux

SLES 10-SP3

What program you're using to connect to PostgreSQL:
Perl DBI
Perl v5.8.8

What version of the ODBC/JDBC/ADO/etc driver you're using, if any:
perl-DBD-Pg 1.43

If you're using a connection pool, load balancer or application server, which one you're using and its version:
None.

Is there anything remotely unusual in the PostgreSQL server logs?
No, they're empty.

CPU manufacturer and model:
Intel Celeron CPU 440 @ 2.00GHz

Amount and size of RAM installed:
2GB RAM

Storage details (important for performance and corruption questions):

Do you use a RAID controller?
No.
How many hard disks are connected to the system and what types are they?
We use a single Hitachi HDT72102 SATA drive (250GB) 7200 RPM.
How are your disks arranged for storage?
Postgres lives on the same 100GB ext3 partition as the OS.

Thanks,
Sean

Responses

Re: Request for help with slow query at 2012-10-29 19:18:19 from salah jubeh
Re: Request for help with slow query at 2012-10-29 19:36:21 from Shaun Thomas
Re: Request for help with slow query at 2012-10-30 09:02:19 from Albe Laurenz

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jeff	2012-10-29 18:27:06	Re: Replaying 48 WAL files takes 80 minutes
Previous Message	Shaun Thomas	2012-10-29 16:17:15	Re: Tons of free RAM. Can't make it go away.