Quick Links

Re: [HACKERS] Big IN() clauses etc : feature proposal

From:	"Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To:	PFC <lists(at)peufeu(dot)com>
Cc:	"Christian Kratzer" <ck(at)cksoft(dot)de>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] Big IN() clauses etc : feature proposal
Date:	2006-05-09 16:43:23
Message-ID:	758d5e7f0605090943p1bbfdae2p4cb28ba128288316@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On 5/9/06, PFC <lists(at)peufeu(dot)com> wrote:
> > You might consider just selecting your primary key or a set of
> > primary keys to involved relations in your search query. If you
> > currently use "select *" this can make your result set very large.
> >
> > Copying all the result set to the temp. costs you additional IO
> > that you propably dont need.
>
> It is a bit of a catch : I need this information, because the purpose of
> the query is to retrieve these objects. I can first store the ids, then
> retrieve the objects, but it's one more query.
>
> > Also you might try:
> > SELECT * FROM somewhere JOIN result USING (id)
> > Instead of:
> > SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)
>
> Yes you're right in this case ; however the query to retrieve the owners
> needs to eliminate duplicates, which IN() does.

Well, you can either
SELECT * FROM somewhere JOIN (SELECT id FROM result GROUP BY id) AS
a USING (id);
or even, for large number of ids:
CREATE TEMPORARY TABLE result_ids AS SELECT id FROM RESULT GROUP BY id;
SELECT * FROM somewhere JOIN result_ids USING (id);

> > On the other hand if your search query runs in 10ms it seems to be fast
> > enough for you to run it multiple times. Theres propably no point in
> > optimizing anything in such case.
>
> I don't think so :
> - 10 ms is a mean time, sometimes it can take much more time, sometimes
> it's faster.
> - Repeating the query might yield different results if records were added
> or deleted in the meantime.

You may SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
though locking might bite you. :)

> - Complex search queries have imprecise rowcount estimates ; hence the
> joins that I would add to them will get suboptimal plans.
>
> Using a temp table is really the cleanest solution now ; but it's too
> slow so I reverted to generating big IN() clauses in the application.

A thought, haven't checked it though, but...

You might want to use PL to store values, say PLperl, or even C, say:

create or replace function perl_store(name text, val int) returns void
as $$ my $name = shift; push @{$foo{$name}}, shift; return $$ LANGUAGE
plperl;

select perl_store('someids', id) from something group by id;
(you may need to warp it inside count())

Then use it:

create or replace function perl_retr(name text) returns setof int as
$$ my $name = shift; return $foo{$name} $$ LANGUAGE plperl;

select * from someother join perl_retr('someids') AS a(id) using (id);

All is in the memory. Of course, you need to do some cleanup, test it,
etc, etc, etc. :)

Should work faster than a in-application solution :)

Regards,
Dawid

In response to

Re: Big IN() clauses etc : feature proposal at 2006-05-09 09:33:42 from PFC

Responses

Re: [HACKERS] Big IN() clauses etc : feature proposal at 2006-05-09 16:49:22 from PFC

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	PFC	2006-05-09 16:49:22	Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message	PFC	2006-05-09 16:38:51	Re: [HACKERS] Big IN() clauses etc : feature proposal

Browse pgsql-performance by date

	From	Date	Subject
Next Message	PFC	2006-05-09 16:49:22	Re: [HACKERS] Big IN() clauses etc : feature proposal
Previous Message	PFC	2006-05-09 16:38:51	Re: [HACKERS] Big IN() clauses etc : feature proposal