Skip site navigation (1) Skip section navigation (2)

Re: Index problem or function problem?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: LIANHE SHAO <lshao2(at)jhmi(dot)edu>
Cc: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index problem or function problem?
Date: 2003-12-15 18:05:51
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
LIANHE SHAO <lshao2(at)jhmi(dot)edu> writes:
> PGA=> explain select ei.expid, er.geneid,
> er.sampleid, ei.annotation, si.samplename,  
> ei.title as exp_name, aaa.chip,
> aaa.sequence_derived_from as accession_number,
> aaa.gene_symbol, aaa.title as gene_function,
> er.exprs, er.mas5exprs from expressiondata er,
> experimentinfo ei, sampleinfo si,
> affy_array_annotation aaa where exists (select
> distinct ei.expid from experimentinfo) and
> ei.annotation =  aaa.chip and (lower (aaa.title)
> like '%mif%' or lower(aaa.sequence_description) like
> '%mif%') and exists (select distinct ei.annotation
> from experimentinfo) and ei.expid = er.expid and
> er.expid = si.expid and er.sampleid = si.sampleid
> and er.geneid = aaa.probeset_id order by si.sampleid
> limit 20;

What is the purpose of the EXISTS() clauses?  They are almost surely not
doing what you intended, because AFAICS they are just an extremely
expensive means of producing a constant-TRUE result.  In
	exists (select distinct ei.expid from experimentinfo)
"ei.expid" is an outer reference, which will necessarily be the same
value over all rows of the sub-select.  After computing this same value
for every row of experimentinfo, the system performs a DISTINCT
operation (sort + unique, not cheap) ... and then all it checks for is
whether at least one row was produced, which means the DISTINCT
operation was completely unnecessary.  The only way the EXISTS could
return false is if experimentinfo were empty, but if it were so then the
outer FROM would've produced no rows and we'd not have got to WHERE

I'm not sure why you get a worse plan for the simpler variant of the
query; it would help to see EXPLAIN ANALYZE rather than EXPLAIN output.
But it's not worth trying to improve the performance until you are
calculating correct answers, and I suspect the above is not doing
what you are after at all.

			regards, tom lane

In response to

pgsql-performance by date

Next:From: Tom LaneDate: 2003-12-15 18:21:47
Subject: Re: TRUNCATE veeeery slow compared to DELETE in 7.4
Previous:From: Rhaoni Chiu PereiraDate: 2003-12-15 17:01:15
Subject: Unsubscribe

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group