Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To:
Cc: Francisco Reyes <lists(at)stringsutils(dot)com>, Pgsql performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Regression: 8.3 2 seconds -> 8.4 100+ seconds
Date: 2010-12-29 02:20:02
Message-ID: 4D1A9AD2.90706@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Reviving a thread from two months ago...Francisco asked about a query
that was much slower in 8.3 at
http://archives.postgresql.org/message-id/cone.1288183283.263738.5695.1000@shelca

There was some theorizing about a stats problem. I've now pulled
pg_stats data from the production server, and that alone doesn't seem to
be enough to account for what's hapenning. Robert asked a good question
at
http://archives.postgresql.org/message-id/AANLkTi=VotmtMz5tYcnQ_5nDhZmsvp_9oYq43FF+qjnn@mail.gmail.com
that never was followed up on. That specificially is what I'm still
confused about, even after collecting a lot more data from this system,
so continuing from there:

Robert Haas wrote:
> This part looks really strange to me. Here we have a nested loop
> whose outer side is estimated to produce 33 rows and whose [inner] side
> is estimated to produce 2 rows. Given that, one would think that the
> estimate for the loop as a whole shouldn't be more than 33 * 2 = 66
> rows (or maybe a bit more if 33 is really 33.4999 rounded down, and 2
> is really 2.49999 rounded down). But the actual estimate is 5 orders
> of magnitude larger. How is that possible?

That part stuck out to me too. I have no idea how that particular bit
of Nested Loop ends up with that giant count for estimated output rows.
I thought it would be about 33 * 2 too, so how is that turning into an
estimate of 2434522 output rows? I believe this is the part that's
executing "cards.useraccount IN".

Here's a cut-down subset of the query that just does the suspicious part
without any nodes above it, run on the 8.4 system having the problem.
This is basically the same plan that was seen in a lower-level node
before, just the number of matching rows for the one loop has gone from
33 to 34 due to more data being added to the table since the previous
run. I think it's easier to follow the logic it's trying to execute
when simplified like this:

SELECT
members.membid
FROM cards
WHERE cards.useraccount IN
(SELECT useraccount FROM cards WHERE membid IN
(SELECT membid from members where commonid = 3594)
)
;

Nested Loop (cost=303.01..742.28 rows=2469120 width=4) (actual
time=0.066..0.082 rows=2 loops=1)
-> HashAggregate (cost=303.01..303.35 rows=34 width=29) (actual
time=0.048..0.048 rows=2 loops=1)
-> Nested Loop (cost=11.86..302.93 rows=34 width=29) (actual
time=0.034..0.045 rows=2 loops=1)
-> HashAggregate (cost=11.86..12.20 rows=34 width=4)
(actual time=0.023..0.024 rows=2 loops=1)
-> Index Scan using members_commonid on members
(cost=0.00..11.77 rows=34 width=4) (actual time=0.014..0.016 rows=2 loops=1)
Index Cond: (commonid = 3594)
-> Index Scan using cards_membid on cards
(cost=0.00..8.54 rows=1 width=33) (actual time=0.009..0.010 rows=1 loops=2)
Index Cond: (public.creditcards.membid = members.membid)
-> Index Scan using cards_useraccount on cards (cost=0.00..12.88
rows=2 width=33) (actual time=0.015..0.016 rows=1 loops=2)
Index Cond: (public.cards.useraccount = public.cards.useraccount)

It's possible to rewrite this whole thing using a join instead of IN,
and sure enough doing so gives a better plan. That's how they got
around this being a crippling issue. I'm still suspicious of what
caused such a performance regression from 8.3 though, where this query
executed so much better.

Stepping back from that part of the query for a second, the main time
related difference between the 8.3 and 8.4 plans involves how much of
the members table gets scanned. When 8.3 looks up a matching item in
that table, in order to implement this part of the larger query:

WHERE members.membid = cards.membid AND members.membershipid = 40

It uses the membid index and gets a quick plan, followed by filtering on
membershipid:

-> Index Scan using members_pkey on members (cost=0.00..0.70 rows=1
width=4) (actual time=0.006..0.007 rows=1 loops=2)
Index Cond: (public.members.membid = public.cards.membid)
Filter: (public.members.membershipid = 40)

8.4 is scanning the whole table instead:

-> Seq Scan on members (cost=0.00..121379.95 rows=2434956 width=4)
(actual time=0.024..1085.143 rows=2435153 loops=1)
Filter: (membershipid = 40)

Which gives you essentially every single member ID available, to then
match against in a Hash join. The filter on membershipid isn't
considered selective at all. I'm not sure why 8.4 isn't also
recognizing the value of being selective on the membid here, to reduce
the number of output rows that come out of that.

Is the mis-estimation of the Nested Loop part causing this sequential
scan to happen, because there are so many more potential values to join
against in the estimate than in reality? If that's the case, it just
comes full circle back to how the node already discussed above is coming
about.

While there are some statistics anonomlies due to data skew on the
production system I don't see how they could explain this Nested Loop
row explosion. I can tell you in detail why some of the lower-level
data is misestimated by a single order of magnitude. For example, if
you focus on this inner part:

SELECT useraccount FROM cards WHERE membid IN
(SELECT membid from members where commonid = 3594));

-> Nested Loop (cost=11.33..293.71 rows=33 width=29) (actual
time=145.940..169.543 rows=2 loops=1)
-> HashAggregate (cost=11.33..11.66 rows=33 width=4) (actual
time=64.730..64.732 rows=2 loops=1)
-> Index Scan using members_commonid on members
(cost=0.00..11.25 rows=33 width=4) (actual time = 64.688..64.703 rows=2
loops=1)
Index Cond: (commonid = 3594)
-> Index Scan using cards_membid on cards (cost=0.00..8.53
rows=1 width=33) (actual time= 52.400..52.401 rows=1 loops=2)
Index Cond: (public.cards.membid = public.members.membid)

The index scan on members_commonid here is estimating 33 rows when there
are actually 2 that match. Looking at the table stats for this
relation, the distribution is a bit skewed because 99.7% of the rows are
set to the sole named MCV: the value "0", that's used as a flag for no
data here instead of NULL (that's a standard legacy system import
compatibility issue). My guess is that the 250 points of histogram data
aren't quite enough to really capture the distribution of the non-zero
values very well in the wake of that, so it's off by a factor of ~16.
That alone isn't enough of an error to switch any of the efficient index
scans into other forms though. The actual runtime in this part of the
plan isn't suffering that badly from this error, it's more that other
plan decisions aren't being made well around it.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Julian v. Bock 2010-12-29 14:28:34 long wait times in ProcessCatchupEvent()
Previous Message Tom Lane 2010-12-28 15:08:20 Re: adding foreign key constraint locks up table