Skip site navigation (1) Skip section navigation (2)

Re: [Gmod-schema] Re: performace problem after VACUUM

From: Scott Cain <cain(at)cshl(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: stan letovsky <SLetovsky(at)aol(dot)com>,pgsql-performance(at)postgresql(dot)org,gmod schema <gmod-schema(at)lists(dot)sourceforge(dot)net>
Subject: Re: [Gmod-schema] Re: performace problem after VACUUM
Date: 2003-02-15 20:36:31
Message-ID: 1045341390.3944.678.camel@localhost.localdomain (view raw or flat)
Thread:
Lists: pgsql-performance
Hello Tom,

Here's the short answer: I've got it working much faster now (>100 msec
for the query by explain analyze).

Here's the long answer: I reworked the table, horribly denormalizing
it.  I changed the coordinate system, so that start is always less than
end, regardless of strand.  Here is the original query:

select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
    from feature f, featureloc fl
    where
      fl.srcfeature_id = 1 and
      ((fl.strand=1  and fl.nbeg <= 393164 and fl.nend >= 390956) OR
      (fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and
      f.feature_id  = fl.feature_id

and here is the equivalent query in the new coordinate system:

 select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
   from feature f, featureloc fl
     where
     fl.srcfeature_id = 1 and
     f.feature_id  = fl.feature_id and
     fl.max >= 390956 and
     fl.min <= 393164

Notice that it is MUCH simpler, and the query planner uses exactly the
indexes I want, and as noted above, runs much faster.  Of course, this
also means that I have to rewrite my database adaptor, but it shouldn't
be too bad.

For those on the GMOD list, here is how I changed the table:

alter table featureloc add column min int;
alter table featureloc add column max int;
update featureloc set min=nbeg where strand=1;
update featureloc set max=nend where strand=1;
update featureloc set max=nbeg where strand=-1;
update featureloc set min=nend where strand=-1;
update featureloc set min=nbeg where (strand=0 or strand is null) and nbeg<nend;
update featureloc set max=nend where (strand=0 or strand is null) and nbeg<nend;
update featureloc set min=nend where (strand=0 or strand is null) and nbeg>nend;
update featureloc set max=nbeg where (strand=0 or strand is null) and nbeg>nend;
create index featureloc_src_min_max on featureloc (srcfeature_id,min,max);
select count(*) from featureloc where min is null and nbeg is not null;

The last select is just a test to make sure I didn't miss anything, and
it did return zero.  Also, it doesn't appear that there are any features
that are strandless.  I found that a little surprising, but included
those updates for completeness.

Tom, thank you much for your help.  Hopefully, I will get the group to
buy into this schema change, and life will be good.

Scott

On Fri, 2003-02-14 at 19:11, Tom Lane wrote:
> SLetovsky(at)aol(dot)com writes:
> > Am I correct in interpreting your comments as saying you believe that
> > if we could lose the OR and the strand constraint PG would probably
> > use the index properly?
> 
> No, I said I thought it could do it without that ;-).  But yes, you'd
> have a much less fragile query if you could lose the OR condition.
> 
> Have you looked into using a UNION ALL instead of OR to merge the two
> sets of results?  It sounds grotty, but might be faster...
> 
> 			regards, tom lane
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain(at)cshl(dot)org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


In response to

pgsql-performance by date

Next:From: Mariusz CzuładaDate: 2003-02-15 23:48:13
Subject: Views with unions
Previous:From: Christopher Kings-LynneDate: 2003-02-15 14:15:01
Subject: Re: Offering tuned config files

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group