Re: performance regression in 9.2/9.3

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Linos <info(at)linos(dot)es>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: performance regression in 9.2/9.3
Date: 2014-06-05 17:39:57
Message-ID: 10598.1401989997@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Thu, Jun 5, 2014 at 9:54 AM, Linos <info(at)linos(dot)es> wrote:
>> What I don't understand is why the statistics have this bad information, all my tests are done on a database just restored and analyzed. Can I do something to improve the quality of my database statistics and let the planner do better choices? Maybe increase the statistics target of the columns involved?

> By that I meant row count estimates coming out of the joins are way
> off. This is pushing the planner into making bad choices. The most
> pervasive problem I see is that the row count estimate boils down to
> '1' at some juncture causing the server to favor nestloop/index scan
> when something like a hash join would likely be more appropriate.

There's some fairly wacko stuff going on in this example, like why
is the inner HashAggregate costed so much higher by 9.3 than 8.4,
when the inputs are basically the same? And why does 9.3 fail to
suppress the SubqueryScan on "ven", when 8.4 does get rid of it?
And why is the final output rows estimate so much higher in 9.3?
That one is actually higher than the product of the two nestloop
inputs, which looks like possibly a bug.

I think what's happening is that 9.3 is picking what it knows to be a less
than optimal join method so that it can sort the output by means of the
ordered scan "Index Scan using referencia_key on modelo mo", and thereby
avoid an explicit sort of what it thinks would be 42512461 rows. With a
closer-to-reality estimate there, it would have gone for a plan more
similar to 8.4's, ie, hash joins and then an explicit sort.

There is a lot going on in this plan that we haven't been told about; for
instance at least one of the query's tables seems to actually be a view,
and some other ones appear to be inheritance trees with partitioning
constraints, and I'm suspicious that some of the aggregates might be
user-defined functions with higher than normal costs.

I'd like to see a self-contained test case, by which I mean full details
about the table/view schemas; it's not clear whether the actual data
is very important here.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-05 17:43:06 Re: slotname vs slot_name
Previous Message Alexey Klyukin 2014-06-05 16:34:24 Re: Could not open file pg_multixact/offsets/ ERROR on 9.3.4