Re: Big IN() clauses etc : feature proposal

From: Christian Kratzer <ck-lists(at)cksoft(dot)de>
To: PFC <lists(at)peufeu(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Big IN() clauses etc : feature proposal
Date: 2006-05-09 09:41:59
Message-ID: 20060509113337.M90693@vesihiisi.cksoft.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi,

On Tue, 9 May 2006, PFC wrote:

>
>> You might consider just selecting your primary key or a set of
>> primary keys to involved relations in your search query. If you
>> currently use "select *" this can make your result set very large.
>>
>> Copying all the result set to the temp. costs you additional IO
>> that you propably dont need.
>
> It is a bit of a catch : I need this information, because the purpose
> of the query is to retrieve these objects. I can first store the ids, then
> retrieve the objects, but it's one more query.

yes but depending on what you really need that can be faster.

Additionally to your query you are already transferring the whole result
set multiple times. First you copy it to the result table. Then you
read it again. Your subsequent queries will also have to read over
all the unneeded tuples just to get your primary key.

>> Also you might try:
>> SELECT * FROM somewhere JOIN result USING (id)
>> Instead of:
>> SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)
>
> Yes you're right in this case ; however the query to retrieve the
> owners needs to eliminate duplicates, which IN() does.

then why useth thy not the DISTINCT clause when building thy result table
and thou shalt have no duplicates.

>> On the other hand if your search query runs in 10ms it seems to be fast
>> enough for you to run it multiple times. Theres propably no point in
>> optimizing anything in such case.
>
> I don't think so :
> - 10 ms is a mean time, sometimes it can take much more time,
> sometimes it's faster.
> - Repeating the query might yield different results if records were
> added or deleted in the meantime.

which is a perfect reason to use a temp table. Another variation on
the temp table scheme is use a result table and add a query_id.

We do something like this in our web application when users submit
complex queries. For each query we store tuples of (query_id,result_id)
in a result table. It's then easy for the web application to page the
result set.

> - Complex search queries have imprecise rowcount estimates ; hence
> the joins that I would add to them will get suboptimal plans.
>
> Using a temp table is really the cleanest solution now ; but it's too
> slow so I reverted to generating big IN() clauses in the application.

A cleaner solution usually pays off in the long run whereas a hackish
or overly complex solution will bite you in the behind for sure as
time goes by.

Greetings
Christian

--
Christian Kratzer ck(at)cksoft(dot)de
CK Software GmbH http://www.cksoft.de/
Phone: +49 7452 889 135 Fax: +49 7452 889 136

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message PFC 2006-05-09 10:10:37 Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message PFC 2006-05-09 09:33:42 Re: Big IN() clauses etc : feature proposal

Browse pgsql-performance by date

  From Date Subject
Next Message blender 2006-05-09 09:45:37 VACUUM killing my CPU
Previous Message PFC 2006-05-09 09:33:42 Re: Big IN() clauses etc : feature proposal