Re: [HACKERS] Big IN() clauses etc : feature proposal

From: PFC <lists(at)peufeu(dot)com>
To: "Christian Kratzer" <ck(at)cksoft(dot)de>
Cc: pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Big IN() clauses etc : feature proposal
Date: 2006-05-09 10:10:37
Message-ID: op.s893vzcpcigqcu@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


> Additionally to your query you are already transferring the whole result
> set multiple times. First you copy it to the result table. Then you
> read it again. Your subsequent queries will also have to read over
> all the unneeded tuples just to get your primary key.

Considering that the result set is not very large and will be cached in
RAM, this shouldn't be a problem.

> then why useth thy not the DISTINCT clause when building thy result
> table and thou shalt have no duplicates.

Because the result table contains no duplicates ;)
I need to remove duplicates in this type of queries :

-- get object owners info
SELECT * FROM users WHERE id IN (SELECT user_id FROM results);

And in this case I find IN() easier to read than DISTINCT (what I posted
was a simplification of my real use case...)

> which is a perfect reason to use a temp table. Another variation on the
> temp table scheme is use a result table and add a query_id.

True. Doesn't solve my problem though : it's still complex, doesn't have
good rowcount estimation, bloats a table (I only need these records for
the duration of the transaction), etc.

> We do something like this in our web application when users submit
> complex queries. For each query we store tuples of (query_id,result_id)
> in a result table. It's then easy for the web application to page the
> result set.

Yes, that is about the only sane way to page big result sets.

> A cleaner solution usually pays off in the long run whereas a hackish
> or overly complex solution will bite you in the behind for sure as
> time goes by.

Yes, but in this case temp tables add too much overhead. I wish there
were RAM based temp tables like in mysql. However I guess the current temp
table slowness comes from the need to mark their existence in the system
catalogs or something. That's why I proposed using cursors...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-05-09 10:36:32 Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message Christian Kratzer 2006-05-09 09:41:59 Re: Big IN() clauses etc : feature proposal

Browse pgsql-performance by date

  From Date Subject
Next Message blender 2006-05-09 10:19:08 PostgreSQL VACCUM killing CPU
Previous Message Jean-Yves F. Barbier 2006-05-09 10:10:32 Re: Arguments Pro/Contra Software Raid