Re: How to improve speed of 3 table join &group (HUGE tables)

From: ismo(dot)tuononen(at)solenovo(dot)fi
To: John Major <major(at)cbio(dot)mskcc(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to improve speed of 3 table join &group (HUGE tables)
Date: 2007-10-19 04:40:17
Message-ID: Pine.LNX.4.64.0710190735550.19669@ismoli.solenovo.jns
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

how about:

select sf.library_id, fio.clip_type , count(sf.sequence_id)
from sequence_fragment sf, fragment_external_info fio
,(SELECT distinct sequence_id from sequence_alignment) sa
where sf.seq_frag_id = fio.sequence_frag_id
and sf.sequence_id = sa.sequence_id
group by sf.library_id, fio.clip_type

I don't know postgres well, but I would put my bet in Oracle in that
derived table instead of that in clause.

Ismo

On Thu, 18 Oct 2007, John Major wrote:

> Hi Hekki-
>
> When I turn seq_scan off for the new query:
>
> explain
> select sf.library_id, fio.clip_type , count(sf.sequence_id)
> from sequence_fragment sf, fragment_external_info fio
> where sf.seq_frag_id = fio.sequence_frag_id
> and sf.sequence_id IN
> (SELECT sequence_id from sequence_alignment)
> group by sf.library_id, fio.clip_type
>
> The index is used... but the cost gets worse!
> it goes from:
> 11831119
> -TO-
> 53654888
>
> Actually... The new query executes in ~ 15 minutes... which is good enough for
> me for now.
>
> Thanks Nis!
>
> john
>
>
>
> Heikki Linnakangas wrote:
> > John Major wrote:
> >
> > > ~there are indexes on all of the fields being joined (but not on
> > > library_id or clip_type ). ~Everything has been re-analyzed post index
> > > creation
> > > ~I've tried "set enable_seqscan=off" and set (join_table_order or
> > > something) = 1
> > >
> >
> > Seqscanning and sorting a table is generally faster than a full scan of
> > the table using an index scan, unless the heap is roughly in the index
> > order. You probably need to CLUSTER the tables to use the indexes
> > effectively.
> >
> > Are you sure you have an index on sequence_alignment.sequence_id? The
> > planner seems to choose a seqscan + sort, even though you've set
> > enable_seqscan=false.
> >
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Yinan Li 2007-10-19 11:57:32 how to improve the performance of creating index
Previous Message John Major 2007-10-18 20:04:57 Re: How to improve speed of 3 table join &group (HUGE tables)