Re: Simple queries take forever to run

From: Michael Guerin <guerin(at)rentec(dot)com>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Simple queries take forever to run
Date: 2003-08-28 20:07:46
Message-ID: 3F4E6112.2060807@rentec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Stephan Szabo wrote:

>On Thu, 28 Aug 2003, Michael Guerin wrote:
>
>
>
>>Stephan Szabo wrote:
>>
>>
>>
>>>On Thu, 28 Aug 2003, Michael Guerin wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Stephan Szabo wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>On Wed, 27 Aug 2003, Michael Guerin wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>I'm running into some performance problems trying to execute simple
>>>>>>queries.
>>>>>>
>>>>>>postgresql version 7.3.3
>>>>>>.conf params changed from defaults.
>>>>>>shared_buffers = 64000
>>>>>>sort_mem = 64000
>>>>>>fsync = false
>>>>>>effective_cache_size = 400000
>>>>>>
>>>>>>ex. query: select * from x where id in (select id from y);
>>>>>>
>>>>>>There's an index on each table for id. SQL Server takes <1s to return,
>>>>>>postgresql doesn't return at all, neither does explain analyze.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>IN(subquery) is known to run poorly in 7.3.x and earlier. 7.4 is
>>>>>generally much better (for reasonably sized subqueries) but in earlier
>>>>>versions you'll probably want to convert into an EXISTS or join form.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Something else seems to be going on, even switching to an exists clause
>>>>gives much better but poor performance.
>>>>count(*) where exists clause: Postgresql 19s, SQL Server <1s
>>>>count(*) where not exists: 23.3s SQL Server 1.5s
>>>>
>>>>
>>>>
>>>>
>>>What does explain analyze show for the two queries?
>>>
>>>
>>>
>>>
>>>
>>>
>>explain analyze select count(*) from tbltimeseries where exists(select
>>uniqid from tblobjectname where timeseriesid = uniqid);
>>Aggregate (cost=5681552.18..5681552.18 rows=1 width=0) (actual
>>time=22756.64..22756.64 rows=1 loops=1)
>> -> Seq Scan on tbltimeseries (cost=0.00..5680051.34 rows=600336
>>width=0) (actual time=22.06..21686.78 rows=1200113 loops=1)
>> Filter: (NOT (subplan))
>> SubPlan
>> -> Index Scan using idx_objectname on tblobjectname
>>(cost=0.00..4.70 rows=1 width=4) (actual time=0.01..0.01 rows=0
>>loops=1200673)
>> Index Cond: ($0 = uniqid)
>> Total runtime: 22756.83 msec
>>(7 rows)
>>
>>
>
>Hmm... I'd thought that it had options for a better plan than that.
>
>What do things like:
>
>explain analyze select count(distinct timeseriesid) from tbltimeseries,
> tblobjectname where timeseriesid=uniquid;
>
>and
>
>explain analyze select count(distinct timeseriesid) from
> tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
> where uniqid is null;
>
>give you?
>
>
>
>
much better performance:

explain analyze select count(distinct timeseriesid) from tbltimeseries,
tblobjectname where timeseriesid=uniquid;

Aggregate (cost=7384.03..7384.03 rows=1 width=8) (actual time=668.15..668.15 rows=1 loops=1)
-> Nested Loop (cost=0.00..7380.83 rows=1282 width=8) (actual time=333.31..666.13 rows=561 loops=1)
-> Seq Scan on tblobjectname (cost=0.00..33.82 rows=1282 width=4) (actual time=0.05..4.98 rows=1282 loops=1)
-> Index Scan using xx on tbltimeseries (cost=0.00..5.72 rows=1 width=4) (actual time=0.51..0.51 rows=0 loops=1282)
Index Cond: (tbltimeseries.timeseriesid = "outer".uniqid)
Total runtime: 669.61 msec
(6 rows)

explain analyze select count(distinct timeseriesid) from
tbltimeseries left outer join tblobjectname on (timeseriesid=uniqid)
where uniqid is null;

Aggregate (cost=59144.19..59144.19 rows=1 width=8) (actual time=12699.47..12699.47 rows=1 loops=1)
-> Hash Join (cost=37.02..56142.51 rows=1200673 width=8) (actual time=7.41..6376.12 rows=1200113 loops=1)
Hash Cond: ("outer".timeseriesid = "inner".uniqid)
Filter: ("inner".uniqid IS NULL)
-> Seq Scan on tbltimeseries (cost=0.00..44082.73 rows=1200673 width=4) (actual time=0.01..3561.61 rows=1200673 loops=1)
-> Hash (cost=33.82..33.82 rows=1282 width=4) (actual time=4.84..4.84 rows=0 loops=1)
-> Seq Scan on tblobjectname (cost=0.00..33.82 rows=1282 width=4) (actual time=0.04..2.84 rows=1282 loops=1)
Total runtime: 12699.76 msec
(8 rows)

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Vivek Khera 2003-08-28 20:20:24 Re: The results of my PostgreSQL/filesystem performance tests
Previous Message Sean Chittenden 2003-08-28 19:31:23 Re: The results of my PostgreSQL/filesystem performance tests