Re: Enhanced containment selectivity function

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Enhanced containment selectivity function
Date: 2005-08-13 02:32:37
Message-ID: 200508130232.j7D2Wb201076@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.2 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Matteo Beccati wrote:
> Hi,
>
> >>Moving it in contrib/ltree would be more difficult to me because it
> >>depends on other functions declared in selfuncs.c
> >>(get_restriction_variable, etc).
> >
> > I'd be willing to consider exporting those functions from selfuncs.c.
>
> In the meanwhile here is the latest patch which uses both mcv and
> histogram values.
>
>
> BTW, when restoring my test database I've found out that there were many
> errors on ALTER INDEX "something" OWNER TO ... :
>
> ERROR: "something" is not a table, view, or sequence
>
> This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be
> wrong, but I didn't get those errors a few days ago (some cvs updates ago).
>
>
> Best regards
> --
> Matteo Beccati
> http://phpadsnew.com/
> http://phppgads.com/

> Index: contrib/ltree/ltree.sql.in
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/contrib/ltree/ltree.sql.in,v
> retrieving revision 1.9
> diff -c -r1.9 ltree.sql.in
> *** contrib/ltree/ltree.sql.in 30 Mar 2004 15:45:32 -0000 1.9
> --- contrib/ltree/ltree.sql.in 6 Aug 2005 13:10:35 -0000
> ***************
> *** 230,236 ****
> RIGHTARG = ltree,
> PROCEDURE = ltree_isparent,
> COMMUTATOR = '<@',
> ! RESTRICT = contsel,
> JOIN = contjoinsel
> );
>
> --- 230,236 ----
> RIGHTARG = ltree,
> PROCEDURE = ltree_isparent,
> COMMUTATOR = '<@',
> ! RESTRICT = parentsel,
> JOIN = contjoinsel
> );
>
> ***************
> *** 248,254 ****
> RIGHTARG = ltree,
> PROCEDURE = ltree_risparent,
> COMMUTATOR = '@>',
> ! RESTRICT = contsel,
> JOIN = contjoinsel
> );
>
> --- 248,254 ----
> RIGHTARG = ltree,
> PROCEDURE = ltree_risparent,
> COMMUTATOR = '@>',
> ! RESTRICT = parentsel,
> JOIN = contjoinsel
> );
>
> Index: src/backend/utils/adt/selfuncs.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
> retrieving revision 1.187
> diff -c -r1.187 selfuncs.c
> *** src/backend/utils/adt/selfuncs.c 21 Jul 2005 04:41:43 -0000 1.187
> --- src/backend/utils/adt/selfuncs.c 6 Aug 2005 13:10:46 -0000
> ***************
> *** 1306,1311 ****
> --- 1306,1488 ----
> return (Selectivity) selec;
> }
>
> + #define DEFAULT_PARENT_SEL 0.001
> +
> + /*
> + * parentsel - Selectivity of parent relationship for ltree data types.
> + */
> + Datum
> + parentsel(PG_FUNCTION_ARGS)
> + {
> + PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
> + Oid operator = PG_GETARG_OID(1);
> + List *args = (List *) PG_GETARG_POINTER(2);
> + int varRelid = PG_GETARG_INT32(3);
> + VariableStatData vardata;
> + Node *other;
> + bool varonleft;
> + Datum *values;
> + int nvalues;
> + float4 *numbers;
> + int nnumbers;
> + double selec = 0.0;
> +
> + /*
> + * If expression is not variable <@ something or something <@ variable,
> + * then punt and return a default estimate.
> + */
> + if (!get_restriction_variable(root, args, varRelid,
> + &vardata, &other, &varonleft))
> + PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
> +
> + /*
> + * If the something is a NULL constant, assume operator is strict and
> + * return zero, ie, operator will never return TRUE.
> + */
> + if (IsA(other, Const) &&
> + ((Const *) other)->constisnull)
> + {
> + ReleaseVariableStats(vardata);
> + PG_RETURN_FLOAT8(0.0);
> + }
> +
> + if (HeapTupleIsValid(vardata.statsTuple))
> + {
> + Form_pg_statistic stats;
> + double mcvsum = 0.0;
> + double mcvsel = 0.0;
> + double hissel = 0.0;
> +
> + stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
> +
> + if (IsA(other, Const))
> + {
> + /* Variable is being compared to a known non-null constant */
> + Datum constval = ((Const *) other)->constvalue;
> + bool match = false;
> + int i;
> +
> + /*
> + * Is the constant "<@" to any of the column's most common
> + * values?
> + */
> + if (get_attstatsslot(vardata.statsTuple,
> + vardata.atttype, vardata.atttypmod,
> + STATISTIC_KIND_MCV, InvalidOid,
> + &values, &nvalues,
> + &numbers, &nnumbers))
> + {
> + FmgrInfo contproc;
> +
> + fmgr_info(get_opcode(operator), &contproc);
> +
> + for (i = 0; i < nvalues; i++)
> + {
> + /* be careful to apply operator right way 'round */
> + if (varonleft)
> + match = DatumGetBool(FunctionCall2(&contproc,
> + values[i],
> + constval));
> + else
> + match = DatumGetBool(FunctionCall2(&contproc,
> + constval,
> + values[i]));
> +
> + /* calculate total selectivity of all most-common-values */
> + mcvsum += numbers[i];
> +
> + /* calculate selectivity of matching most-common-values */
> + if (match)
> + mcvsel += numbers[i];
> + }
> + }
> + else
> + {
> + /* no most-common-values info available */
> + values = NULL;
> + numbers = NULL;
> + i = nvalues = nnumbers = 0;
> + }
> +
> + free_attstatsslot(vardata.atttype, values, nvalues,
> + NULL, 0);
> +
> +
> + /*
> + * Is the constant "<@" to any of the column's histogram
> + * values?
> + */
> + if (get_attstatsslot(vardata.statsTuple,
> + vardata.atttype, vardata.atttypmod,
> + STATISTIC_KIND_HISTOGRAM, InvalidOid,
> + &values, &nvalues,
> + NULL, NULL))
> + {
> + FmgrInfo contproc;
> +
> + fmgr_info(get_opcode(operator), &contproc);
> +
> + for (i = 0; i < nvalues; i++)
> + {
> + /* be careful to apply operator right way 'round */
> + if (varonleft)
> + match = DatumGetBool(FunctionCall2(&contproc,
> + values[i],
> + constval));
> + else
> + match = DatumGetBool(FunctionCall2(&contproc,
> + constval,
> + values[i]));
> + /* count matching histogram values */
> + if (match)
> + hissel++;
> + }
> +
> + if (hissel > 0.0)
> + {
> + /*
> + * some matching values found inside histogram, divide matching entries number
> + * by total histogram entries to get the histogram related selectivity
> + */
> + hissel /= nvalues;
> + }
> + }
> + else
> + {
> + /* no histogram info available */
> + values = NULL;
> + i = nvalues = 0;
> + }
> +
> + free_attstatsslot(vardata.atttype, values, nvalues,
> + NULL, 0);
> +
> +
> + /*
> + * calculate selectivity based on MCV and histogram result
> + * histogram selectivity needs to be scaled down if there are any most-common-values
> + */
> + selec = mcvsel + hissel * (1.0 - mcvsum);
> +
> + /* don't return 0.0 selectivity unless all table values are inside mcv */
> + if (selec == 0.0 && mcvsum != 1.0)
> + selec = DEFAULT_PARENT_SEL;
> + }
> + else
> + selec = DEFAULT_PARENT_SEL;
> + }
> + else
> + selec = DEFAULT_PARENT_SEL;
> +
> +
> + ReleaseVariableStats(vardata);
> +
> + /* result should be in range, but make sure... */
> + CLAMP_PROBABILITY(selec);
> +
> + PG_RETURN_FLOAT8((float8) selec);
> + }
> +
> /*
> * eqjoinsel - Join selectivity of "="
> */
> Index: src/include/catalog/pg_proc.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
> retrieving revision 1.380
> diff -c -r1.380 pg_proc.h
> *** src/include/catalog/pg_proc.h 2 Aug 2005 16:11:57 -0000 1.380
> --- src/include/catalog/pg_proc.h 6 Aug 2005 13:10:59 -0000
> ***************
> *** 3750,3755 ****
> --- 3750,3758 ----
> DATA(insert OID = 2592 ( gist_circle_compress PGNSP PGUID 12 f f t f i 1 2281 "2281" _null_ _null_ _null_ gist_circle_compress - _null_ ));
> DESCR("GiST support");
>
> + DATA(insert OID = 2600 ( parentsel PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_ _null_ parentsel - _null_ ));
> + DESCR("enhanced restriction selectivity for ltree isparent comparison operators");
> +
>
> /*
> * Symbolic values for provolatile column: these indicate whether the result
> Index: src/include/utils/selfuncs.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/utils/selfuncs.h,v
> retrieving revision 1.23
> diff -c -r1.23 selfuncs.h
> *** src/include/utils/selfuncs.h 5 Jun 2005 22:32:58 -0000 1.23
> --- src/include/utils/selfuncs.h 6 Aug 2005 13:11:00 -0000
> ***************
> *** 95,100 ****
> --- 95,102 ----
> extern Datum nlikesel(PG_FUNCTION_ARGS);
> extern Datum icnlikesel(PG_FUNCTION_ARGS);
>
> + extern Datum parentsel(PG_FUNCTION_ARGS);
> +
> extern Datum eqjoinsel(PG_FUNCTION_ARGS);
> extern Datum neqjoinsel(PG_FUNCTION_ARGS);
> extern Datum scalarltjoinsel(PG_FUNCTION_ARGS);
>

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-08-13 02:34:08 Re: gettime() - a timeofday() alternative
Previous Message Bruce Momjian 2005-08-13 02:30:43 Re: For Review: Allow WAL information to recover corrupted

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2005-08-13 02:34:08 Re: gettime() - a timeofday() alternative
Previous Message Bruce Momjian 2005-08-13 02:30:43 Re: For Review: Allow WAL information to recover corrupted