Re: Bad performance of SELECT ... where id IN (...)

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Xia Qingran <qingran(dot)xia(at)gmail(dot)com>
Cc: Omar Kilani <omar(dot)kilani(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad performance of SELECT ... where id IN (...)
Date: 2009-10-09 13:09:05
Message-ID: 20091009130905.GO23159@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Oct 09, 2009 at 08:31:54PM +0800, Xia Qingran wrote:
> On Mon, Oct 5, 2009 at 9:58 AM, Omar Kilani <omar(dot)kilani(at)gmail(dot)com> wrote:
> > Hi Xia,
> >
> > Try this patch:
> >
> > http://treehou.se/~omar/postgresql-8.4.1-array_sel_hack.patch
> >
> > It's a hack, but it works for us. I think you're probably spending
> > most of your query time planning, and this patch helps speed things up
> > 10x over here.
>
> Thanks!
> I am trying it.
>
> Regards,
>
> Xia Qingran
>

We have a similar situation when using DSPAM with a PostgreSQL
backend. In that case we used a function like the following to
speed up the lookups. I do not know if it would be useful in
your situation, but I thought I would post it for the group:

The original query was of the form:

SELECT uid, token, spam_hits, innocent_hits FROM dspam_token_data
WHERE uid = 'xxx' AND token IN (...);

The faster version of the query in the current code is:

SELECT * FROM lookup_tokens(%d, '{...});

where lookup_tokens is defined as follows:

create function lookup_tokens(integer,bigint[])
returns setof dspam_token_data
language plpgsql stable
as '
declare
v_rec record;
begin
for v_rec in select * from dspam_token_data
where uid=$1
and token in (select $2[i]
from generate_series(array_lower($2,1),
array_upper($2,1)) s(i))
loop
return next v_rec;
end loop;
return;
end;';

Anyway, you may want to try a similar approach instead of the
posted code change.

Regards,
Ken

> >
> > Regards,
> > Omar
> >
> > On Sun, Sep 27, 2009 at 5:13 PM, Xia Qingran <qingran(dot)xia(at)gmail(dot)com> wrote:
> >> On Sat, Sep 26, 2009 at 10:59 PM, Craig James
> >> <craig_james(at)emolecules(dot)com> wrote:
> >>>
> >>> If your user_id is always in a narrow range like this, or even in any range
> >>> that is a small fraction of the total, then add a range condition, like
> >>> this:
> >>>
> >>> select * from event where user_id <= 500 and user_id >= 0 and user_id in
> >>> (...)
> >>>
> >>> I did this exact same thing in my application and it worked well.
> >>>
> >>> Craig
> >>>
> >>
> >> It is a good idea. But In my application, most of the queries' user_id
> >> are random and difficult to range.
> >> Thanks anyway.
> >>
> >>
> >>
> >> --
> >> ?????????
> >> Xia Qingran
> >> qingran(dot)xia(at)gmail(dot)com
> >> Sent from Beijing, 11, China
> >> Charles de Gaulle ??- "The better I get to know men, the more I find
> >> myself loving dogs." -
> >> http://www.brainyquote.com/quotes/authors/c/charles_de_gaulle.html
> >>
> >> --
> >> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-performance
> >>
> >
>
>
>
> --
> ?????????
> Xia Qingran
> qingran(dot)xia(at)gmail(dot)com
> Sent from Beijing, 11, China
> Stephen Leacock - "I detest life-insurance agents: they always argue
> that I shall some day die, which is not so." -
> http://www.brainyquote.com/quotes/authors/s/stephen_leacock.html
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2009-10-09 13:27:23 Re: High CPU load on Postgres Server during Peak times!!!!
Previous Message Xia Qingran 2009-10-09 12:31:54 Re: Bad performance of SELECT ... where id IN (...)