Skip site navigation (1) Skip section navigation (2)

Re: Bad plan choice nestloop vs. hashjoin

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan choice nestloop vs. hashjoin
Date: 2010-01-18 17:13:24
Message-ID: (view raw or flat)
Lists: pgsql-performance
Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> We have just upgraded our monitoring server software and
> now the following query for graphing the data performs
> abysmally with the default settings. Here is the results
> of the EXPLAIN ANALYZE run with nestloops enabled:

I poked at this a bit more and now think I see where the problem is.
The thing that would be easiest for you to do something about is
the misestimation here:

>                            ->  Nested Loop Anti Join  (cost=94.07..10294.64 rows=1 width=8) (actual time=98.049..27907.702 rows=281 loops=1)
>                                  Join Filter: (gii.graphid = g.graphid)
>                                  ->  Bitmap Heap Scan on graphs g  (cost=94.07..233.17 rows=1 width=8) (actual time=0.529..1.772 rows=281 loops=1)
>                                        Recheck Cond: (graphid = ANY ('{2,3,4,5,386,387,969,389,971,972,973,446,447,448,449,450,451,471,456,470,473,477,472,474,475,476,478,479,480,481,482,483,484,459,614,655,658,645,490,492,489,493,496,495,498,497,499,501,500,502,974,558,559,562,566,563,564,565,567,568,569,570,571,535,572,573,534,536,538,539,540,541,542,543,544,545,537,546,547,548,552,553,554,555,556,549,550,551,557,577,578,579,580,574,576,581,835,587,588,589,590,560,561,836,591,592,593,594,595,827,389,495,498,497,597,598,599,975,978,999,1004,604,605,606,679,616,634,635,636,637,638,618,629,630,631,632,633,671,682,669,670,678,679,680,674,672,676,673,675,677,681,682,683,683,644,652,829,681,687,698,685,686,705,706,707,708,830,945,946,710,716,712,714,713,709,718,721,720,719,723,724,747,749,750,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,772,774,775,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,777,776,977,824,823,826,825,829,832,833,835,581!
>                                        Filter: ((graphid / 100000000000000::bigint) = 0)
>                                        ->  Bitmap Index Scan on graphs_pkey  (cost=0.00..94.07 rows=246 width=0) (actual time=0.507..0.507 rows=294 loops=1)
>                                              Index Cond: (graphid = ANY ('{2,3,4,5,386,387,969,389,971,972,973,446,447,448,449,450,451,471,456,470,473,477,472,474,475,476,478,479,480,481,482,483,484,459,614,655,658,645,490,492,489,493,496,495,498,497,499,501,500,502,974,558,559,562,566,563,564,565,567,568,569,570,571,535,572,573,534,536,538,539,540,541,542,543,544,545,537,546,547,548,552,553,554,555,556,549,550,551,557,577,578,579,580,574,576,581,835,587,588,589,590,560,561,836,591,592,593,594,595,827,389,495,498,497,597,598,599,975,978,999,1004,604,605,606,679,616,634,635,636,637,638,618,629,630,631,632,633,671,682,669,670,678,679,680,674,672,676,673,675,677,681,682,683,683,644,652,829,681,687,698,685,686,705,706,707,708,830,945,946,710,716,712,714,713,709,718,721,720,719,723,724,747,749,750,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,772,774,775,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,777,776,977,824,823,826,825,829,832,833,835!

The estimate of the ANY condition is not too bad (246 vs 294 actual).  But it
hasn't got any ability to deal with the "(graphid / 100000000000000::bigint) = 0"
filter condition, and is falling back to a default selectivity estimate for
that, which IIRC is just 0.005 --- but actually, that condition doesn't
eliminate any rows at all.  Do you need that condition in the first
place?  Can you persuade your client-side software to eliminate it when
it's impossible based on the ANY list?  Or at least recast it to
something more easily estimatable, like "graphid < 100000000000000"?

If you really have to have the condition just like that, I'd advise
creating an index on "(graphid / 100000000000000::bigint)".  That would
cause ANALYZE to accumulate statistics on that expression, which'd
result in a far better estimate.

The reason that this misestimate hammers it so hard is that the
inside of the nestloop looks like

>                                  ->  Nested Loop  (cost=0.00..17449.43 rows=1954 width=8) (actual time=99.304..99.304 rows=0 loops=281)
>                                        ->  Index Scan using graphs_items_2 on graphs_items gii  (cost=0.00..69.83 rows=1954 width=16) (actual time=0.013..3.399 rows=1954 loops=281)
>                                        ->  Index Scan using items_pkey on items ii  (cost=0.00..8.88 rows=1 width=8) (actual time=0.046..0.046 rows=0 loops=549074)
>                                              Index Cond: (ii.itemid = gii.itemid)
>                                              Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
>                                              SubPlan 1
>                                                ->  Nested Loop  (cost=0.00..7.83 rows=1 width=0) (actual time=0.040..0.040 rows=0 loops=549074)
>                                                      Join Filter: (rr.groupid = ugg.usrgrpid)
>                                                      ->  Nested Loop  (cost=0.00..6.67 rows=1 width=8) (actual time=0.037..0.037 rows=0 loops=549074)
>                                                            Join Filter: (hgg.groupid =
>                                                            ->  Index Scan using hosts_groups_1 on hosts_groups hgg  (cost=0.00..4.27 rows=1 width=8) (actual time=0.003..0.005 rows=1 loops=549074)
>                                                                  Index Cond: ($0 = hostid)
>                                                            ->  Seq Scan on rights rr  (cost=0.00..2.39 rows=1 width=16) (actual time=0.027..0.027 rows=0 loops=532214)
>                                                                  Filter: (rr.permission < 2)
>                                                      ->  Seq Scan on users_groups ugg  (cost=0.00..1.15 rows=1 width=8) (never executed)
>                                                            Filter: (ugg.userid = 20)
>                                              SubPlan 2
>                                                ->  Nested Loop  (cost=2.34..8.13 rows=1 width=8) (never executed)
>                                                      ->  Nested Loop  (cost=0.00..3.55 rows=1 width=8) (never executed)
>                                                            Join Filter: (rr.groupid = ugg.usrgrpid)
>                                                            ->  Seq Scan on rights rr  (cost=0.00..2.39 rows=1 width=16) (never executed)
>                                                                  Filter: (permission < 2)
>                                                            ->  Seq Scan on users_groups ugg  (cost=0.00..1.15 rows=1 width=8) (never executed)
>                                                                  Filter: (ugg.userid = 20)
>                                                      ->  Bitmap Heap Scan on hosts_groups hgg  (cost=2.34..4.45 rows=11 width=16) (never executed)
>                                                            Recheck Cond: (hgg.groupid =
>                                                            ->  Bitmap Index Scan on hosts_groups_2  (cost=0.00..2.33 rows=11 width=0) (never executed)
>                                                                  Index Cond: (hgg.groupid =

The alternative subplans are variant implementations of the inner EXISTS
test.  Unfortunately, it's choosing to go with the "retail" lookup, not
knowing that that's going to wind up being executed 549074 times.
If it had gone to the "wholesale" hashtable implementation, this would
probably have run a whole lot faster.  (We have it in mind to allow the
executor to switch to the other implementation on-the-fly when it
becomes clear that the planner misjudged the rowcount, but it's not done

However, I'm curious about your statement that this used to perform
better.  Used to perform better on what, and what was the plan back
then?  I don't believe that pre-8.4 PG would have done better than
8.4 on either of these points.  It certainly wouldn't have won on the
EXISTS subplan, because before 8.4 that would always have been done in
the "retail" style.

			regards, tom lane

In response to


pgsql-performance by date

Next:From: Kenneth MarshallDate: 2010-01-18 18:57:44
Subject: Re: Bad plan choice nestloop vs. hashjoin
Previous:From: fkater@googlemail.comDate: 2010-01-18 16:38:44
Subject: Re: Inserting 8MB bytea: just 25% of disk perf used?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group