Re: Strange query optimization in 7.3.2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alec Mitchell <apm13(at)columbia(dot)edu>
Cc: pgsql-general(at)postgreSQL(dot)org
Subject: Re: Strange query optimization in 7.3.2
Date: 2003-04-15 05:39:45
Message-ID: 5249.1050385185@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Alec Mitchell <apm13(at)columbia(dot)edu> writes:
> On Monday 14 April 2003 04:08 pm, Tom Lane wrote:
>> Could you see your way to sending me a dump of these tables for testing
>> purposes? You could exclude the columns not used in the FROM/WHERE
>> clauses, if that is needed to satisfy privacy worries.

> I've attached dumps of the relevant tables (not including the stops
> table (s), it is huge, and doesn't come into play until after the bad
> planner estimate).

Indeed, there's a problem here with nulls --- there is a case in
eqjoinsel() that didn't account for nulls, and should've. I've applied
the attached patch to fix it. This reduces the estimate of the tr/t/m
join size from 1119 to 332, which is still large compared to the true
size of 52, but it's a lot better. Without the stops table, I can't
really tell if this changes the complete plan or not --- could you apply
the patch and find out?

BTW, the remaining error seems to be because the tr.group_num = 1
constraint skews the fraction of matching "trailer" values. I fear
there's little chance of persuading the system to figure that out in the
near future --- it can't be done without cross-column correlation
statistics, which we do not keep.

regards, tom lane

*** src/backend/utils/adt/selfuncs.c.orig Sat Mar 22 20:49:13 2003
--- src/backend/utils/adt/selfuncs.c Tue Apr 15 01:13:01 2003
***************
*** 1552,1578 ****
{
/*
* We do not have MCV lists for both sides. Estimate the join
! * selectivity as MIN(1/nd1, 1/nd2). This is plausible if we
! * assume that the values are about equally distributed: a
! * given tuple of rel1 will join to either 0 or N2/nd2 rows of
! * rel2, so total join rows are at most N1*N2/nd2 giving a
! * join selectivity of not more than 1/nd2. By the same logic
! * it is not more than 1/nd1, so MIN(1/nd1, 1/nd2) is an upper
! * bound. Using the MIN() means we estimate from the point of
! * view of the relation with smaller nd (since the larger nd
! * is determining the MIN). It is reasonable to assume that
! * most tuples in this rel will have join partners, so the
! * bound is probably reasonably tight and should be taken
! * as-is.
*
* XXX Can we be smarter if we have an MCV list for just one
* side? It seems that if we assume equal distribution for the
* other side, we end up with the same answer anyway.
*/
if (nd1 > nd2)
! selec = 1.0 / nd1;
else
! selec = 1.0 / nd2;
}

if (have_mcvs1)
--- 1552,1584 ----
{
/*
* We do not have MCV lists for both sides. Estimate the join
! * selectivity as MIN(1/nd1,1/nd2)*(1-nullfrac1)*(1-nullfrac2).
! * This is plausible if we assume that the join operator is
! * strict and the non-null values are about equally distributed:
! * a given non-null tuple of rel1 will join to either zero or
! * N2*(1-nullfrac2)/nd2 rows of rel2, so total join rows are at
! * most N1*(1-nullfrac1)*N2*(1-nullfrac2)/nd2 giving a join
! * selectivity of not more than (1-nullfrac1)*(1-nullfrac2)/nd2.
! * By the same logic it is not more than
! * (1-nullfrac1)*(1-nullfrac2)/nd1, so the expression with MIN()
! * is an upper bound. Using the MIN() means we estimate from the
! * point of view of the relation with smaller nd (since the larger
! * nd is determining the MIN). It is reasonable to assume that
! * most tuples in this rel will have join partners, so the bound
! * is probably reasonably tight and should be taken as-is.
*
* XXX Can we be smarter if we have an MCV list for just one
* side? It seems that if we assume equal distribution for the
* other side, we end up with the same answer anyway.
*/
+ double nullfrac1 = stats1->stanullfrac;
+ double nullfrac2 = stats2->stanullfrac;
+
+ selec = (1.0 - nullfrac1) * (1.0 - nullfrac2);
if (nd1 > nd2)
! selec /= nd1;
else
! selec /= nd2;
}

if (have_mcvs1)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-04-15 05:44:59 Re: [GENERAL] CREATE USER within function
Previous Message Corey Scott 2003-04-15 05:08:05 Performance Difference Between the different function types