Re: [Gmod-schema] Re: performace problem after VACUUM

From: Scott Cain <cain(at)cshl(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: stan letovsky <SLetovsky(at)aol(dot)com>, pgsql-performance(at)postgresql(dot)org, gmod schema <gmod-schema(at)lists(dot)sourceforge(dot)net>
Subject: Re: [Gmod-schema] Re: performace problem after VACUUM
Date: 2003-02-15 20:36:31
Message-ID: 1045341390.3944.678.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Tom,

Here's the short answer: I've got it working much faster now (>100 msec
for the query by explain analyze).

Here's the long answer: I reworked the table, horribly denormalizing
it. I changed the coordinate system, so that start is always less than
end, regardless of strand. Here is the original query:

select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
from feature f, featureloc fl
where
fl.srcfeature_id = 1 and
((fl.strand=1 and fl.nbeg <= 393164 and fl.nend >= 390956) OR
(fl.strand=-1 and fl.nend <= 393164 and fl.nbeg >= 390956)) and
f.feature_id = fl.feature_id

and here is the equivalent query in the new coordinate system:

select distinct f.name,fl.nbeg,fl.nend,fl.strand,f.type_id,f.feature_id
from feature f, featureloc fl
where
fl.srcfeature_id = 1 and
f.feature_id = fl.feature_id and
fl.max >= 390956 and
fl.min <= 393164

Notice that it is MUCH simpler, and the query planner uses exactly the
indexes I want, and as noted above, runs much faster. Of course, this
also means that I have to rewrite my database adaptor, but it shouldn't
be too bad.

For those on the GMOD list, here is how I changed the table:

alter table featureloc add column min int;
alter table featureloc add column max int;
update featureloc set min=nbeg where strand=1;
update featureloc set max=nend where strand=1;
update featureloc set max=nbeg where strand=-1;
update featureloc set min=nend where strand=-1;
update featureloc set min=nbeg where (strand=0 or strand is null) and nbeg<nend;
update featureloc set max=nend where (strand=0 or strand is null) and nbeg<nend;
update featureloc set min=nend where (strand=0 or strand is null) and nbeg>nend;
update featureloc set max=nbeg where (strand=0 or strand is null) and nbeg>nend;
create index featureloc_src_min_max on featureloc (srcfeature_id,min,max);
select count(*) from featureloc where min is null and nbeg is not null;

The last select is just a test to make sure I didn't miss anything, and
it did return zero. Also, it doesn't appear that there are any features
that are strandless. I found that a little surprising, but included
those updates for completeness.

Tom, thank you much for your help. Hopefully, I will get the group to
buy into this schema change, and life will be good.

Scott

On Fri, 2003-02-14 at 19:11, Tom Lane wrote:
> SLetovsky(at)aol(dot)com writes:
> > Am I correct in interpreting your comments as saying you believe that
> > if we could lose the OR and the strand constraint PG would probably
> > use the index properly?
>
> No, I said I thought it could do it without that ;-). But yes, you'd
> have a much less fragile query if you could lose the OR condition.
>
> Have you looked into using a UNION ALL instead of OR to merge the two
> sets of results? It sounds grotty, but might be faster...
>
> regards, tom lane
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain(at)cshl(dot)org
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mariusz Czułada 2003-02-15 23:48:13 Views with unions
Previous Message Christopher Kings-Lynne 2003-02-15 14:15:01 Re: Offering tuned config files