Re: Bad plan choice nestloop vs. hashjoin

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad plan choice nestloop vs. hashjoin
Date: 2010-01-18 18:57:44
Message-ID: 20100118185743.GA1844@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Jan 18, 2010 at 12:13:24PM -0500, Tom Lane wrote:
> Kenneth Marshall <ktm(at)rice(dot)edu> writes:
> > We have just upgraded our monitoring server software and
> > now the following query for graphing the data performs
> > abysmally with the default settings. Here is the results
> > of the EXPLAIN ANALYZE run with nestloops enabled:
>
> I poked at this a bit more and now think I see where the problem is.
> The thing that would be easiest for you to do something about is
> the misestimation here:
>
> > -> Nested Loop Anti Join (cost=94.07..10294.64 rows=1 width=8) (actual time=98.049..27907.702 rows=281 loops=1)
> > Join Filter: (gii.graphid = g.graphid)
> > -> Bitmap Heap Scan on graphs g (cost=94.07..233.17 rows=1 width=8) (actual time=0.529..1.772 rows=281 loops=1)
> > Recheck Cond: (graphid = ANY ('{2,3,4,5,386,387,969,389,971,972,973,446,447,448,449,450,451,471,456,470,473,477,472,474,475,476,478,479,480,481,482,483,484,459,614,655,658,645,490,492,489,493,496,495,498,497,499,501,500,502,974,558,559,562,566,563,564,565,567,568,569,570,571,535,572,573,534,536,538,539,540,541,542,543,544,545,537,546,547,548,552,553,554,555,556,549,550,551,557,577,578,579,580,574,576,581,835,587,588,589,590,560,561,836,591,592,593,594,595,827,389,495,498,497,597,598,599,975,978,999,1004,604,605,606,679,616,634,635,636,637,638,618,629,630,631,632,633,671,682,669,670,678,679,680,674,672,676,673,675,677,681,682,683,683,644,652,829,681,687,698,685,686,705,706,707,708,830,945,946,710,716,712,714,713,709,718,721,720,719,723,724,747,749,750,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,772,774,775,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,777,776,977,824,823,826,825,829,832,833,835,581!
> ,836,842,852,854,839,840,838,853,855,847,848,944,846,859,850,899,901,902,903,864,865,866,867,976,979,939,941,942,943,906,907,908,909,910,868,969,991,950,955,964,966,952,953,962,965,967,959,961,968,1001,1002,1003,986,987,988,994,995,996,1008,1006,1007,1009,1010}'::bigint[]))
> > Filter: ((graphid / 100000000000000::bigint) = 0)
> > -> Bitmap Index Scan on graphs_pkey (cost=0.00..94.07 rows=246 width=0) (actual time=0.507..0.507 rows=294 loops=1)
> > Index Cond: (graphid = ANY ('{2,3,4,5,386,387,969,389,971,972,973,446,447,448,449,450,451,471,456,470,473,477,472,474,475,476,478,479,480,481,482,483,484,459,614,655,658,645,490,492,489,493,496,495,498,497,499,501,500,502,974,558,559,562,566,563,564,565,567,568,569,570,571,535,572,573,534,536,538,539,540,541,542,543,544,545,537,546,547,548,552,553,554,555,556,549,550,551,557,577,578,579,580,574,576,581,835,587,588,589,590,560,561,836,591,592,593,594,595,827,389,495,498,497,597,598,599,975,978,999,1004,604,605,606,679,616,634,635,636,637,638,618,629,630,631,632,633,671,682,669,670,678,679,680,674,672,676,673,675,677,681,682,683,683,644,652,829,681,687,698,685,686,705,706,707,708,830,945,946,710,716,712,714,713,709,718,721,720,719,723,724,747,749,750,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,772,774,775,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,777,776,977,824,823,826,825,829,832,833,835!
> ,581,836,842,852,854,839,840,838,853,855,847,848,944,846,859,850,899,901,902,903,864,865,866,867,976,979,939,941,942,943,906,907,908,909,910,868,969,991,950,955,964,966,952,953,962,965,967,959,961,968,1001,1002,1003,986,987,988,994,995,996,1008,1006,1007,1009,1010}'::bigint[]))
>
> The estimate of the ANY condition is not too bad (246 vs 294 actual). But it
> hasn't got any ability to deal with the "(graphid / 100000000000000::bigint) = 0"
> filter condition, and is falling back to a default selectivity estimate for
> that, which IIRC is just 0.005 --- but actually, that condition doesn't
> eliminate any rows at all. Do you need that condition in the first
> place? Can you persuade your client-side software to eliminate it when
> it's impossible based on the ANY list? Or at least recast it to
> something more easily estimatable, like "graphid < 100000000000000"?
>
> If you really have to have the condition just like that, I'd advise
> creating an index on "(graphid / 100000000000000::bigint)". That would
> cause ANALYZE to accumulate statistics on that expression, which'd
> result in a far better estimate.
>
> The reason that this misestimate hammers it so hard is that the
> inside of the nestloop looks like
>
> > -> Nested Loop (cost=0.00..17449.43 rows=1954 width=8) (actual time=99.304..99.304 rows=0 loops=281)
> > -> Index Scan using graphs_items_2 on graphs_items gii (cost=0.00..69.83 rows=1954 width=16) (actual time=0.013..3.399 rows=1954 loops=281)
> > -> Index Scan using items_pkey on items ii (cost=0.00..8.88 rows=1 width=8) (actual time=0.046..0.046 rows=0 loops=549074)
> > Index Cond: (ii.itemid = gii.itemid)
> > Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
> > SubPlan 1
> > -> Nested Loop (cost=0.00..7.83 rows=1 width=0) (actual time=0.040..0.040 rows=0 loops=549074)
> > Join Filter: (rr.groupid = ugg.usrgrpid)
> > -> Nested Loop (cost=0.00..6.67 rows=1 width=8) (actual time=0.037..0.037 rows=0 loops=549074)
> > Join Filter: (hgg.groupid = rr.id)
> > -> Index Scan using hosts_groups_1 on hosts_groups hgg (cost=0.00..4.27 rows=1 width=8) (actual time=0.003..0.005 rows=1 loops=549074)
> > Index Cond: ($0 = hostid)
> > -> Seq Scan on rights rr (cost=0.00..2.39 rows=1 width=16) (actual time=0.027..0.027 rows=0 loops=532214)
> > Filter: (rr.permission < 2)
> > -> Seq Scan on users_groups ugg (cost=0.00..1.15 rows=1 width=8) (never executed)
> > Filter: (ugg.userid = 20)
> > SubPlan 2
> > -> Nested Loop (cost=2.34..8.13 rows=1 width=8) (never executed)
> > -> Nested Loop (cost=0.00..3.55 rows=1 width=8) (never executed)
> > Join Filter: (rr.groupid = ugg.usrgrpid)
> > -> Seq Scan on rights rr (cost=0.00..2.39 rows=1 width=16) (never executed)
> > Filter: (permission < 2)
> > -> Seq Scan on users_groups ugg (cost=0.00..1.15 rows=1 width=8) (never executed)
> > Filter: (ugg.userid = 20)
> > -> Bitmap Heap Scan on hosts_groups hgg (cost=2.34..4.45 rows=11 width=16) (never executed)
> > Recheck Cond: (hgg.groupid = rr.id)
> > -> Bitmap Index Scan on hosts_groups_2 (cost=0.00..2.33 rows=11 width=0) (never executed)
> > Index Cond: (hgg.groupid = rr.id)
>
> The alternative subplans are variant implementations of the inner EXISTS
> test. Unfortunately, it's choosing to go with the "retail" lookup, not
> knowing that that's going to wind up being executed 549074 times.
> If it had gone to the "wholesale" hashtable implementation, this would
> probably have run a whole lot faster. (We have it in mind to allow the
> executor to switch to the other implementation on-the-fly when it
> becomes clear that the planner misjudged the rowcount, but it's not done
> yet.)
>
> However, I'm curious about your statement that this used to perform
> better. Used to perform better on what, and what was the plan back
> then? I don't believe that pre-8.4 PG would have done better than
> 8.4 on either of these points. It certainly wouldn't have won on the
> EXISTS subplan, because before 8.4 that would always have been done in
> the "retail" style.
>
> regards, tom lane
>

Hi Tom,

Your update beat me. I was just about to send you the trimmed example.
The performance problem was not present previously because this code
is new in this release, a .0 release. This section of code is completely
new. I will post a bug report to the developers with your analysis
and hopefully they can include the fix in the .1 release. I am going
to create the index so I do not have to track source code changes
to their application. The performance is still 2X slower than with
enable_nestloop set to off, but that is great compared to 200X slower
without the index:

CREATE INDEX fix_bad_stat_estimate ON graphs ((graphid / 100000000000000::bigint));

Thank you again.

Regards,
Ken

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2010-01-18 23:37:12 Re: Inserting 8MB bytea: just 25% of disk perf used?
Previous Message Tom Lane 2010-01-18 17:13:24 Re: Bad plan choice nestloop vs. hashjoin