Re: [HACKERS] Big IN() clauses etc : feature proposal

From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: PFC <lists(at)peufeu(dot)com>
Cc: "Christian Kratzer" <ck(at)cksoft(dot)de>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Big IN() clauses etc : feature proposal
Date: 2006-05-09 16:43:23
Message-ID: 758d5e7f0605090943p1bbfdae2p4cb28ba128288316@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 5/9/06, PFC <lists(at)peufeu(dot)com> wrote:
> > You might consider just selecting your primary key or a set of
> > primary keys to involved relations in your search query. If you
> > currently use "select *" this can make your result set very large.
> >
> > Copying all the result set to the temp. costs you additional IO
> > that you propably dont need.
>
> It is a bit of a catch : I need this information, because the purpose of
> the query is to retrieve these objects. I can first store the ids, then
> retrieve the objects, but it's one more query.
>
> > Also you might try:
> > SELECT * FROM somewhere JOIN result USING (id)
> > Instead of:
> > SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)
>
> Yes you're right in this case ; however the query to retrieve the owners
> needs to eliminate duplicates, which IN() does.

Well, you can either
SELECT * FROM somewhere JOIN (SELECT id FROM result GROUP BY id) AS
a USING (id);
or even, for large number of ids:
CREATE TEMPORARY TABLE result_ids AS SELECT id FROM RESULT GROUP BY id;
SELECT * FROM somewhere JOIN result_ids USING (id);

> > On the other hand if your search query runs in 10ms it seems to be fast
> > enough for you to run it multiple times. Theres propably no point in
> > optimizing anything in such case.
>
> I don't think so :
> - 10 ms is a mean time, sometimes it can take much more time, sometimes
> it's faster.
> - Repeating the query might yield different results if records were added
> or deleted in the meantime.

You may SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
though locking might bite you. :)

> - Complex search queries have imprecise rowcount estimates ; hence the
> joins that I would add to them will get suboptimal plans.
>
> Using a temp table is really the cleanest solution now ; but it's too
> slow so I reverted to generating big IN() clauses in the application.

A thought, haven't checked it though, but...

You might want to use PL to store values, say PLperl, or even C, say:

create or replace function perl_store(name text, val int) returns void
as $$ my $name = shift; push @{$foo{$name}}, shift; return $$ LANGUAGE
plperl;

select perl_store('someids', id) from something group by id;
(you may need to warp it inside count())

Then use it:

create or replace function perl_retr(name text) returns setof int as
$$ my $name = shift; return $foo{$name} $$ LANGUAGE plperl;

select * from someother join perl_retr('someids') AS a(id) using (id);

All is in the memory. Of course, you need to do some cleanup, test it,
etc, etc, etc. :)

Should work faster than a in-application solution :)

Regards,
Dawid

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message PFC 2006-05-09 16:49:22 Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message PFC 2006-05-09 16:38:51 Re: [HACKERS] Big IN() clauses etc : feature proposal

Browse pgsql-performance by date

  From Date Subject
Next Message PFC 2006-05-09 16:49:22 Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message PFC 2006-05-09 16:38:51 Re: [HACKERS] Big IN() clauses etc : feature proposal