The plan changes when the limit gets above ~850,000

From: "Wm(dot)A(dot)Stafford" <stafford(at)marine(dot)rutgers(dot)edu>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: The plan changes when the limit gets above ~850,000
Date: 2008-10-07 15:48:05
Message-ID: 48EB84B5.7010305@marine.rutgers.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

First of all, thanks to Albe Laurenz for pointing out how to analyze the
situation where there was a change in behavior when the limit was above
about 850,000.

I looked at the plan for both versions of the query and, as Albe
suggested, the plans were different. I don't know how to analyze the
plans but I'm guessing that when the number of records returned gets
larger setup time is an increasingly smaller part of the entire process
so more time can be spent on setup. This would result in the apparent
inactivity of the application running the query when limit went from
850,000 to 1,000,000

I think I would like the query to work the same regardless of the number
of records returned but I'm not sure: 1. if that can be done, 2. it
would be substantially slower or 3. if there is any choice in the matter
at all.

The two plans are below. Any help or advice would be appreciated.

-=bill

the query:
SELECT A.cache_id, A.validname, B.resource_full_name, B.resource_id ,
C.worms_id
FROM cache A, resources B, taxa C
WHERE A.resource_id=B.resource_id AND A.taxon_id=C.taxon_id
OFFSET 0 LIMIT 800000

the plan when limit = 800000
"Limit (cost=0.00..5868507.75 rows=800000 width=614)"
" -> Nested Loop (cost=0.00..104095524.49 rows=14190391 width=614)"
" -> Merge Join (cost=0.00..2615512.71 rows=14190391 width=94)"
" Merge Cond: (c.taxon_id = a.taxon_id)"
" -> Index Scan using ix_taxon_id on taxa c
(cost=0.00..2020343.76 rows=519932 width=12)"
" -> Index Scan using ix_taxon_id2 on "cache" a
(cost=0.00..819727.67 rows=14190391 width=94)"
" -> Index Scan using obis_resources_pkey on obis_resources b
(cost=0.00..7.12 rows=1 width=524)"
" Index Cond: (a.resource_id = b.resource_id)"

the plan when limit = 1000000
"Limit (cost=5941567.20..5974067.35 rows=1000000 width=614)"
" -> Merge Join (cost=5941567.20..6402757.08 rows=14190391 width=614)"
" Merge Cond: (b.resource_id = a.resource_id)"
" -> Sort (cost=105.54..106.63 rows=437 width=524)"
" Sort Key: b.resource_id"
" -> Seq Scan on obis_resources b (cost=0.00..86.37
rows=437 width=524)"
" -> Sort (cost=5941461.66..5976937.64 rows=14190391 width=94)"
" Sort Key: a.resource_id"
" -> Merge Join (cost=97698.57..1151605.96 rows=14190391
width=94)"
" Merge Cond: (a.taxon_id = c.taxon_id)"
" -> Index Scan using ix_taxon_id2 on "cache" a
(cost=0.00..819727.67 rows=14190391 width=94)"
" -> Sort (cost=97698.57..98998.40 rows=519932
width=12)"
" Sort Key: c.taxon_id"
" -> Seq Scan on taxa c (cost=0.00..27007.32
rows=519932 width=12)"

Responses

Browse pgsql-jdbc by date

  From Date Subject
Next Message Tom Lane 2008-10-07 16:13:19 Re: The plan changes when the limit gets above ~850,000
Previous Message Tilman Rassy 2008-10-07 14:57:10 Re: Aggregate function: Different results with jdbc and psql