Skip site navigation (1) Skip section navigation (2)

Weird index or sort behaviour

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Weird index or sort behaviour
Date: 2009-08-18 13:20:21
Message-ID: alpine.DEB.2.00.0908181348560.19472@aragorn.flymine.org (view raw or flat)
Thread:
Lists: pgsql-performance
I'm seeing some interesting behaviour. I'm executing a query where I 
perform a merge join between two copies of the same table, completely 
symmetrically, and the two sides of the merge are sourced differently.

SELECT COUNT(*)
FROM
     (SELECT DISTINCT
         l1.objectid,
         l1.id AS id1,
         l1.intermine_start AS start1,
         l1.intermine_end AS end1,
         l2.id AS id2,
         l2.intermine_start AS start2,
         l2.intermine_end AS end2
     FROM
         locationbin8000 l1,
         locationbin8000 l2
     WHERE
             l1.subjecttype = 'GeneFlankingRegion'
         AND l2.subjecttype = 'GeneFlankingRegion'
         AND l1.objectid = l2.objectid
         AND l1.bin = l2.bin
     ) AS a
WHERE
         start1 <= end2
     AND start2 <= end1;

                       QUERY PLAN
---------------------------------------------------------
  Aggregate
    (cost=703459.72..703459.73 rows=1 width=0)
    (actual time=43673.526..43673.527 rows=1 loops=1)
    ->  HashAggregate
          (cost=657324.23..677828.89 rows=2050466 width=28)
          (actual time=33741.380..42187.885 rows=17564726 loops=1)
          ->  Merge Join
                (cost=130771.22..621441.07 rows=2050466 width=28)
                (actual time=456.970..15292.997 rows=21463106 loops=1)
                Merge Cond: ((l1.objectid = l2.objectid) AND (l1.bin = l2.bin))
                Join Filter: ((l1.intermine_start <= l2.intermine_end) AND (l2.intermine_start <= l1.intermine_end))
                ->  Index Scan using locationbin8000__subjectobjectbin on locationbin8000 l1
                      (cost=0.00..72096.78 rows=670733 width=20)
                      (actual time=0.085..345.834 rows=664588 loops=1)
                      Index Cond: (subjecttype = 'GeneFlankingRegion'::text)
                ->  Sort
                      (cost=130771.22..132448.05 rows=670733 width=20)
                      (actual time=456.864..3182.638 rows=38231659 loops=1)
                      Sort Key: l2.objectid, l2.bin
                      Sort Method:  quicksort  Memory: 81690kB
                      ->  Bitmap Heap Scan on locationbin8000 l2
                            (cost=12706.60..65859.76 rows=670733 width=20)
                            (actual time=107.259..271.026 rows=664588 loops=1)
                            Recheck Cond: (subjecttype = 'GeneFlankingRegion'::text)
                            ->  Bitmap Index Scan on locationbin8000__subjecttypeid
                                  (cost=0.00..12538.92 rows=670733 width=0)
                                  (actual time=106.327..106.327 rows=664588 loops=1)
                                  Index Cond: (subjecttype = 'GeneFlankingRegion'::text)
  Total runtime: 44699.675 ms
(15 rows)

Here is the definition of the locationbin8000 table:

     Table "public.locationbin8000"
      Column      |  Type   | Modifiers
-----------------+---------+-----------
  id              | integer |
  objectid        | integer |
  intermine_start | integer |
  intermine_end   | integer |
  subjecttype     | text    |
  bin             | integer |
Indexes:
     "locationbin8000__subjectobjectbin" btree (subjecttype, objectid, bin)
     "locationbin8000__subjecttypeid" btree (subjecttype, id)

The table is clustered on the locationbin8000__subjectobjectbin index, and 
has been analysed.

So you can see, the merge join requires two inputs both ordered by 
(objectid, bin), which is readily supplied by the 
locationbin8000__subjectobjectbin index, given that I am restricting the 
subjecttype of both sides (to the same thing, I might add). Therefore, I 
would expect the merge join to feed off two identical index scans. This is 
what happens for one of the sides of the merge join, but not the other, 
even though the sides are symmetrical.

Does anyone know why it isn't doing two index scans? Given that the cost 
of the index scan is half that of the alternative, I'm surprised that it 
uses this plan.

I'm using Postgres 8.4.0

Matthew

-- 
 "Interwoven alignment preambles are not allowed."
 If you have been so devious as to get this message, you will understand
 it, and you deserve no sympathy.  -- Knuth, in the TeXbook

Responses

pgsql-performance by date

Next:From: Ivan VorasDate: 2009-08-18 14:12:17
Subject: Re: Need suggestions on kernel settings for dedicated FreeBSD/Postgresql machine
Previous:From: Pierre Frédéric CaillaudDate: 2009-08-18 09:38:35
Subject: Re: Getting time of a postgresql-request

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group