Re: Incorrect estimates on columns

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Kratz <chris(dot)kratz(at)vistashare(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Incorrect estimates on columns
Date: 2007-10-18 00:23:39
Message-ID: 13440.1192667019@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Chris Kratz <chris(dot)kratz(at)vistashare(dot)com> writes:
> On Wednesday 17 October 2007 14:49, Tom Lane wrote:
>> Evidently it's not realizing that every row of par will have a join
>> partner, but why not? I suppose a.activityid is unique, and in most
>> cases that I've seen the code seems to get that case right.
>>
>> Would you show us the pg_stats rows for par.activity and a.activityid?

> Here are the pg_stats rows for par.activity and a.activityid.

Hmm, nothing out of the ordinary there.

I poked at this a bit and realized that what seems to be happening is
that the a.programid = 171 condition is reducing the selectivity
estimate --- that is, it knows that that will filter out X percent of
the activity rows, and it assumes that *the size of the join result will
be reduced by that same percentage*, since join partners would then be
missing for some of the par rows. The fact that the join result doesn't
actually decrease in size at all suggests that there's some hidden
correlation between the programid condition and the condition on
par.provider_lfm. Is that true? Maybe you could eliminate one of the
two conditions from the query?

Since PG doesn't have any cross-table (or even cross-column) statistics
it's not currently possible for the optimizer to deal very well with
hidden correlations like this ...

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ow Mun Heng 2007-10-18 01:33:46 Re: Shared Buffer setting in postgresql.conf
Previous Message Tom Lane 2007-10-17 23:28:42 Re: Huge amount of memory consumed during transaction