Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: smclellan(at)mintel(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5120: Performance difference between running a query with named cursor and straight SELECT
Date: 2009-10-15 21:33:17
Message-ID: 14062.1255642397@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Steve McLellan <smclellan(at)mintel(dot)com> writes:
> Sorry, I should've tried that. The two explains are below; without cursor
> then with cursor. I don't know enough to reliably say whether there's
> anything wrong with either (we use default_statistics_target=100 although
> the estimates don't look right for some of the query fragments), but they're
> certainly very different.

Right, so the cursor plan is one that is estimated to take 3x longer,
but can deliver rows starting right away, where the other plan will have
a long delay and then spit everything out in a bunch. It's not a bad
tradeoff, if the estimate were right. The killer mistake is here:

> -> Index Scan using "Originator Type" on originator_dimension
> (cost=0.00..125.29 rows=1 width=4) (actual time=26.067..41.906 rows=169
> loops=1)
> Index Cond: ("Originator Type" = 'NO DATA'::text)
> Filter: ("Area" = ANY ('{National,"Phoenix, AZ"}'::text[]))

Since that's on the outside of a nestloop, the 169x rowcount error
translates directly to a 169x error in the total runtime estimate ---
and it looks like that's not very far at all from the real ratio,
so the other estimates are pretty good.

I'm not sure whether you'd get any real improvement from a further
increase in stats target, though that's certainly an easy thing to
try. What this looks like to me is the longstanding problem of having
correlations between the distributions of two different columns
(here, Area and Originator Type), which the planner doesn't know about.
So its estimate of the combined probability of meeting the conditions
is off. You could try checking estimated vs real rowcount for the
area and originator conditions separately to see if that's the case
or not. If either one is off by itself then a stats increase for that
column would help, if not there's not much to be done except lean on
the cursor target knob ...

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Douglas, Ryan 2009-10-15 21:47:03 Re: BUG #5121: Segmentation Fault when using pam w/ krb5
Previous Message Heikki Linnakangas 2009-10-15 21:22:32 Re: BUG #5121: Segmentation Fault when using pam w/ krb5