Query performance PLEASE HELP

From: Dmitry Tkach <dmitry(at)openratings(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Query performance PLEASE HELP
Date: 2003-01-31 19:30:46
Message-ID: 3E3ACEE6.7060603@openratings.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi, everybody!

I have spent the last few days trying to figure out why one (or rather some) of my queries takes so long time, but I am about to give up :-(

Something seems to be definitely wrong here, but I am out of ideas :-(
I was wonderring if somebody could please take a look at this and, perhaps, give me some clue what is going on? I would greatly appreciate that...

There are two tables involved:

create table tradestyle
(
duns int,
type int,
name text not null
);
create index tradestyle_idx on tradestyle (name);
create unique index tradestyle_duns_idx on tradestyle (duns,type);

create table managed_supplier
(
id serial primary key,
duns int not null,
subscriber int not null
);
create unique index managed_supplier_idx on managed_supplier (subscriber, duns);

tradestyle table is huge (about 50 million rows), managed_supplier is much smaller (under 100K)

The query I am fighting with is something like:

select * from tradestyle ts, managed_supplier ms where ts.duns=ms.duns and ts.name like 'POST%' and ms.subscriber=74 order by ts.name limit 10;

This takes close to 20 minutes (!) to run for the first time (it returns almost instanteneouly if I run it again, but will take about 20 minutes
again if I modify the name parameter).

Explain analyze says:

Limit (cost=61.91..61.91 rows=1 width=192) (actual time=439435.47..439435.50 rows=10 loops=1)
-> Sort (cost=61.91..61.91 rows=1 width=192) (actual time=439435.47..439435.48 rows=11 loops=1)
-> Nested Loop (cost=0.00..61.90 rows=1 width=192) (actual time=7589.68..439423.75 rows=110 loops=1)
-> Index Scan using managed_supplier_idx on managed_supplier ms (cost=0.00..22.02 rows=5 width=157) (actual time=6.72..3009.90 rows=14365 loops=1)
-> Index Scan using tradestyle_duns_idx on tradestyle ts (cost=0.00..6.97 rows=1 width=35) (actual time=30.34..30.37 rows=0 loops=14365)
Total runtime: 439436.45 msec

Well... this time it was just 7 minutes, but I've seen it beeing 20 (with a different param for the name) too, and, 7 minutes is still kinda a lot!

I figure, what's killing me here, is having to read the large number of rows from disk for the nested loop... But here is what I get from the stats:
(The total number of blocks read times block size over 439 seconds it took to run):

select (sum(heap_blks_read) + sum(idx_blks_read))*8192/439 from pg_statio_user_tables where relname in ('tradestyle', 'managed_supplier');
-[ RECORD 1 ]---------------
?column? | 574391.6902050114

500K per second???? This is slower than my DSL line!!!

I do believe, it is really the physical read that takes most of the time
(I sampled it by stopping in the debugger periodically - and every time it was sitting in __libc_read (),
and also, if I run the same exact query again, it returns instanteneously), but I can't understand why is it
taking THIS long????

It is also NOT a faulty drive problem - I have this same database installed on a few different boxes, and I have the same problem everywhere.

If you guys have any idea what can possibly be going on here, PLEASE HELP!!!!

Thanks a lot!

Dima.

P.S. Here is one more example, with a different name param, it chooses to use a different plan (that actually makes more sense to me, but whatever),
and took just under 5 minutes to run (but, the "improvement" is just an illusion, because I ran this thing after the previous one - so, most of the stuff was cached):

Limit (cost=0.00..16.14 rows=1 width=192) (actual time=6926.37..297527.99 rows=10 loops=1)
-> Nested Loop (cost=0.00..16.14 rows=1 width=192) (actual time=6926.36..297527.94 rows=11 loops=1)
-> Index Scan using tradestyle_name_idx on tradestyle ts (cost=0.00..7.98 rows=1 width=35) (actual time=51.99..295646.78 rows=41020 loops=1)
-> Index Scan using managed_supplier_idx on managed_supplier ms (cost=0.00..5.82 rows=1 width=157) (actual time=0.04..0.04 rows=0 loops=41020)
Total runtime: 297528.31 msec

I'll greatly appreciate any ideas you could come up with.

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Doug McNaught 2003-01-31 19:35:48 Re: Clearing tables questions
Previous Message Tom Lane 2003-01-31 19:09:02 Re: ALTER STATISTIC and dump ?