Skip site navigation (1) Skip section navigation (2)

Re: Weird index or sort behaviour

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Weird index or sort behaviour
Date: 2009-08-18 18:40:18
Message-ID: alpine.DEB.2.00.0908181901220.19472@aragorn.flymine.org (view raw or flat)
Thread:
Lists: pgsql-performance
On Tue, 18 Aug 2009, Tom Lane wrote:
>> I would be more curious in the poster's situation to turn off
>> enable_seqscan, enable_sort, and/or enable_nestloop see how the index
>> scan merge join plan runs.

Like this:

                             QUERY PLAN
-----------------------------------------------------------------------
  Aggregate
    (cost=2441719.92..2441719.93 rows=1 width=0)
    (actual time=50087.537..50087.538 rows=1 loops=1)
    ->  HashAggregate
         (cost=2397366.95..2417079.38 rows=1971243 width=28)
         (actual time=40462.069..48634.713 rows=17564726 loops=1)
          ->  Merge Join
                (cost=0.00..2362870.20 rows=1971243 width=28)
                (actual time=0.095..22041.693 rows=21463106 loops=1)
                Merge Cond: ((l1.objectid = l2.objectid) AND (l1.bin = l2.bin))
                Join Filter: ((l1.intermine_start <= l2.intermine_end) AND (l2.intermine_start <= l1.intermine_end))
                ->  Index Scan using locationbin8000__subjectobjectbin on locationbin8000 l1
                      (cost=0.00..71635.23 rows=657430 width=20)
                      (actual time=0.056..170.857 rows=664588 loops=1)
                      Index Cond: (subjecttype = 'GeneFlankingRegion'::text)
                ->  Index Scan using locationbin8000__subjectobjectbin on locationbin8000 l2
                      (cost=0.00..71635.23 rows=657430 width=20)
                      (actual time=0.020..9594.466 rows=38231659 loops=1)
                      Index Cond: (l2.subjecttype = 'GeneFlankingRegion'::text)
  Total runtime: 50864.569 ms
(10 rows)

>> rewinding an index scan is more expensive than rewinding a materialize 
>> node but would it really be so much expensive that it's worth copying 
>> the entire table into temporary space?
>
> Absolutely not, but remember that what we're expecting the Materialize
> to do is buffer only as far back as the last Mark, so that it's unlikely
> ever to spill to disk.

If that's how it works, then that sounds very promising indeed.

> In particular, in Matthew's example the sort is being estimated at 
> significantly higher cost than the indexscan, which presumably means 
> that we are estimating there will be a *lot* of re-fetches, else we 
> wouldn't have rejected the indexscan on the inside.

select sum(c * c) / sum(c) from (select objectid, bin, count(*) AS c from 
locationbin8000 where subjecttype = 'GeneFlankingRegion' GROUP BY 
objectid, bin) as a;
       ?column?
---------------------
  57.5270393085641029

So on average, we will be rewinding by 57 rows each time. A materialise 
step really does sound like a win in this situation.

Matthew

-- 
 Patron: "I am looking for a globe of the earth."
 Librarian: "We have a table-top model over here."
 Patron: "No, that's not good enough. Don't you have a life-size?"
 Librarian: (pause) "Yes, but it's in use right now."

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2009-08-18 19:09:52
Subject: Re: Weird index or sort behaviour
Previous:From: Tom LaneDate: 2009-08-18 17:57:11
Subject: Re: Weird index or sort behaviour

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group