Re: Slow query with joins

From: Bendik Rognlien Johansen <bendik(dot)johansen(at)gmail(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow query with joins
Date: 2006-01-11 21:30:58
Message-ID: 850AD6F9-08BD-4EF2-A549-407FD87C7974@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The sort is definitively the culprit. When I removed it the query was
instant. I tried setting work_mem = 131072 but it did not seem to
help. I really don't understand this :-( Any other ideas?

Thanks!

On Jan 11, 2006, at 9:23 PM, Jim C. Nasby wrote:

> I'd try figuring out if the join is the culprit or the sort is (by
> dropping the ORDER BY). work_mem is probably forcing the sort to spill
> to disk, and if your drives are rather busy...
>
> You might also get a win if you re-order the joins to people,
> contacts,
> addresses, if you know it will have the same result.
>
> In this case LIMIT won't have any real effect, because you have to go
> all the way through with the ORDER BY anyway.
>
> On Wed, Jan 11, 2006 at 08:55:32PM +0100, Bendik Rognlien Johansen
> wrote:
>> Yes, the rowcount estimates are real, however, it has been a long
>> time since the last VACUUM FULL (there is never a good time).
>>
>> I have clustered the tables, reindexed, analyzed, vacuumed and the
>> plan now looks like this:
>>
>>
>> no_people=# explain SELECT r.id AS r_id, r.firstname || ' ' ||
>> r.lastname AS r_name, ad.id AS ad_id, ad.type AS ad_type, ad.address
>> AS ad_address, ad.postalcode AS ad_postalcode, ad.postalsite AS
>> ad_postalsite, ad.priority AS ad_priority, ad.position[0] AS ad_lat,
>> ad.position[1] AS ad_lon, ad.uncertainty AS ad_uncertainty, ad.extra
>> AS ad_extra, ad.deleted AS ad_deleted, co.id AS co_id, co.type AS
>> co_type, co.value AS co_value, co.description AS co_description,
>> co.priority AS co_priority, co.visible AS co_visible, co.searchable
>> AS co_searchable, co.deleted AS co_deleted FROM people r LEFT OUTER
>> JOIN addresses ad ON(r.id = ad.record) LEFT OUTER JOIN contacts co ON
>> (r.id = co.record) WHERE NOT r.deleted AND r.original IS NULL ORDER
>> BY r.id;
>> QUERY PLAN
>> ---------------------------------------------------------------------
>> ---
>> --------------------------------------------------
>> Sort (cost=182866.49..182943.12 rows=30655 width=587)
>> Sort Key: r.id
>> -> Nested Loop Left Join (cost=0.00..170552.10 rows=30655
>> width=587)
>> -> Nested Loop Left Join (cost=0.00..75054.96 rows=26325
>> width=160)
>> -> Index Scan using people_deleted_original_is_null
>> on people r (cost=0.00..1045.47 rows=23861 width=27)
>> Filter: ((NOT deleted) AND (original IS NULL))
>> -> Index Scan using addresses_record_idx on
>> addresses ad (cost=0.00..3.05 rows=4 width=137)
>> Index Cond: ("outer".id = ad.record)
>> -> Index Scan using contacts_record_idx on contacts co
>> (cost=0.00..3.32 rows=24 width=431)
>> Index Cond: ("outer".id = co.record)
>> (10 rows)
>>
>>
>>
>>
>>
>>
>> Looks faster, but still very slow. I added limit 1000 and it has been
>> running for about 25 minutes now with no output. top shows:
>>
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 29994 postgres 18 0 95768 78m 68m R 17.0 7.7 0:53.27
>> postmaster
>>
>>
>>
>> which is unusual, I usually get 99.9 %cpu for just about any query,
>> which leads me to believe this is disk related.
>>
>>
>>
>> postgresql.conf:
>> shared_buffers = 8192
>> work_mem = 8192
>> maintenance_work_mem = 524288
>>
>>
>>
>>
>> Hardware 2x2.8GHz cpu
>> 1GB ram
>>
>> Could this be an issue related to lack of VACUUM FULL? The tables get
>> a lot of updates.
>>
>>
>> Thank you very much so far!
>>
>>
>>
>>
>> On Jan 11, 2006, at 4:45 PM, Tom Lane wrote:
>>
>>> Bendik Rognlien Johansen <bendik(dot)johansen(at)gmail(dot)com> writes:
>>>> Has anyone got any tips for speeding up this query? It currently
>>>> takes hours to start.
>>>
>>> Are the rowcount estimates close to reality? The plan doesn't look
>>> unreasonable to me if they are. It might help to increase work_mem
>>> to ensure that the hash tables don't spill to disk.
>>>
>>> Indexes:
>>> "people_original_is_null" btree (original) WHERE original IS
>>> NULL
>>>
>>> This index seems poorly designed: the actual index entries are dead
>>> weight since all of them are necessarily NULL. You might as well
>>> make
>>> the index carry something that you frequently test in conjunction
>>> with
>>> "original IS NULL". For instance, if this particular query is a
>>> common
>>> case, you could replace this index with
>>>
>>> CREATE INDEX people_deleted_original_is_null ON people(deleted)
>>> WHERE original IS NULL;
>>>
>>> This index is still perfectly usable for queries that only say
>>> "original
>>> IS NULL", but it can also filter out rows with the wrong value of
>>> deleted. Now, if there are hardly any rows with deleted = true,
>>> maybe
>>> this won't help much for your problem. But in any case you ought to
>>> consider whether you can make the index entries do something useful.
>>>
>>> regards, tom lane
>>
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dave Dutcher 2006-01-11 22:05:18 Showing Column Statistics Number
Previous Message Tom Lane 2006-01-11 21:13:20 Re: NOT LIKE much faster than LIKE?