Weird index or sort behaviour

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Weird index or sort behaviour
Date: 2009-08-18 13:20:21
Message-ID: alpine.DEB.2.00.0908181348560.19472@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


I'm seeing some interesting behaviour. I'm executing a query where I
perform a merge join between two copies of the same table, completely
symmetrically, and the two sides of the merge are sourced differently.

SELECT COUNT(*)
FROM
(SELECT DISTINCT
l1.objectid,
l1.id AS id1,
l1.intermine_start AS start1,
l1.intermine_end AS end1,
l2.id AS id2,
l2.intermine_start AS start2,
l2.intermine_end AS end2
FROM
locationbin8000 l1,
locationbin8000 l2
WHERE
l1.subjecttype = 'GeneFlankingRegion'
AND l2.subjecttype = 'GeneFlankingRegion'
AND l1.objectid = l2.objectid
AND l1.bin = l2.bin
) AS a
WHERE
start1 <= end2
AND start2 <= end1;

QUERY PLAN
---------------------------------------------------------
Aggregate
(cost=703459.72..703459.73 rows=1 width=0)
(actual time=43673.526..43673.527 rows=1 loops=1)
-> HashAggregate
(cost=657324.23..677828.89 rows=2050466 width=28)
(actual time=33741.380..42187.885 rows=17564726 loops=1)
-> Merge Join
(cost=130771.22..621441.07 rows=2050466 width=28)
(actual time=456.970..15292.997 rows=21463106 loops=1)
Merge Cond: ((l1.objectid = l2.objectid) AND (l1.bin = l2.bin))
Join Filter: ((l1.intermine_start <= l2.intermine_end) AND (l2.intermine_start <= l1.intermine_end))
-> Index Scan using locationbin8000__subjectobjectbin on locationbin8000 l1
(cost=0.00..72096.78 rows=670733 width=20)
(actual time=0.085..345.834 rows=664588 loops=1)
Index Cond: (subjecttype = 'GeneFlankingRegion'::text)
-> Sort
(cost=130771.22..132448.05 rows=670733 width=20)
(actual time=456.864..3182.638 rows=38231659 loops=1)
Sort Key: l2.objectid, l2.bin
Sort Method: quicksort Memory: 81690kB
-> Bitmap Heap Scan on locationbin8000 l2
(cost=12706.60..65859.76 rows=670733 width=20)
(actual time=107.259..271.026 rows=664588 loops=1)
Recheck Cond: (subjecttype = 'GeneFlankingRegion'::text)
-> Bitmap Index Scan on locationbin8000__subjecttypeid
(cost=0.00..12538.92 rows=670733 width=0)
(actual time=106.327..106.327 rows=664588 loops=1)
Index Cond: (subjecttype = 'GeneFlankingRegion'::text)
Total runtime: 44699.675 ms
(15 rows)

Here is the definition of the locationbin8000 table:

Table "public.locationbin8000"
Column | Type | Modifiers
-----------------+---------+-----------
id | integer |
objectid | integer |
intermine_start | integer |
intermine_end | integer |
subjecttype | text |
bin | integer |
Indexes:
"locationbin8000__subjectobjectbin" btree (subjecttype, objectid, bin)
"locationbin8000__subjecttypeid" btree (subjecttype, id)

The table is clustered on the locationbin8000__subjectobjectbin index, and
has been analysed.

So you can see, the merge join requires two inputs both ordered by
(objectid, bin), which is readily supplied by the
locationbin8000__subjectobjectbin index, given that I am restricting the
subjecttype of both sides (to the same thing, I might add). Therefore, I
would expect the merge join to feed off two identical index scans. This is
what happens for one of the sides of the merge join, but not the other,
even though the sides are symmetrical.

Does anyone know why it isn't doing two index scans? Given that the cost
of the index scan is half that of the alternative, I'm surprised that it
uses this plan.

I'm using Postgres 8.4.0

Matthew

--
"Interwoven alignment preambles are not allowed."
If you have been so devious as to get this message, you will understand
it, and you deserve no sympathy. -- Knuth, in the TeXbook

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ivan Voras 2009-08-18 14:12:17 Re: Need suggestions on kernel settings for dedicated FreeBSD/Postgresql machine
Previous Message Pierre Frédéric Caillaud 2009-08-18 09:38:35 Re: Getting time of a postgresql-request