Specifying attribute slot for storing/reading statistics

From: Esteban Zimanyi <ezimanyi(at)ulb(dot)ac(dot)be>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Specifying attribute slot for storing/reading statistics
Date: 2019-09-05 09:39:44
Message-ID: CAPqRbE5F3Hft8+e8prrc7Ew8N5uydrrFD2Xy-CCONVTC+zFnvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear all

We are developing MobilityDB, an open source PostgreSQL/PostGIS extension
that provides temporal and spatio-temporal types. The source code, manuals,
and related publications are available at the address
https://github.com/ULB-CoDE-WIT/MobilityDB/
<https://github.com/ULB-CoDE-WIT/MobilityDB/tree/stats>

In MobilityDB temporal types are types derived from PostgreSQL/PostGIS
types to which a time dimension is added. MobilityDB provides the following
temporal types: tbool (temporal boolean), tint (temporal int), tfloat
(temporal float), text (temporal text), tgeompoint (temporal geometric
points) and tgeogpoint (temporal geographic points). For example, we can
define a tfloat and a tgeompoint as follows

SELECT tfloat '[1(dot)5(at)2000-01-01, 2(dot)5(at)2000-01-02, 1(dot)5(at)2000-01-03]';
SELECT tgeompoint '[Point(0 0)@2000-01-01 08:00, Point(1 0)@2000-01-02
08:05, Point(1 1)@2000-01-03 08:10]';

We are developing the analyze/selectivity functions for those types. Our
approach is to use the standard PostgreSQL/PostGIS functions for the value
and the time dimensions where the slots starting from 0 will be used for
the value dimension, and the slots starting from 2 will be used for the
time dimension. For example, for tfloat we use range_typanalyze and related
functions for
* collecting in slots 0 and 1, STATISTIC_KIND_BOUNDS_HISTOGRAM
and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the float ranges of the value
dimension
* collecting in slots 2 and 3, STATISTIC_KIND_BOUNDS_HISTOGRAM
and STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM for the periods (similar to
tstzranges) of the time dimension

However, we end up copying several PostgreSQL functions to which we only
add an additional parameter stating the slot number from which the specific
statistic kind should be found (either 0 or 2)

bool
get_attstatsslot_mobdb(AttStatsSlot *sslot, HeapTuple statstuple,
int reqkind, Oid reqop, int flags, int startslot)
{
[...]
for (i = startslot; i < STATISTIC_NUM_SLOTS; i++)
{
if ((&stats->stakind1)[i] == reqkind &&
(reqop == InvalidOid || (&stats->staop1)[i] == reqop))
break;
}
[...]
}

double
var_eq_const_mobdb(VariableStatData *vardata, Oid operator, Datum constval,
bool negate, int startslot)
{
[...]
}
Selectivity
scalarineqsel_mobdb(PlannerInfo *root, Oid operator, bool isgt, bool iseq,
VariableStatData *vardata, Datum constval, Oid consttype,
int startslot)
{
[...]
}

static Selectivity
mcv_selectivity_mobdb(VariableStatData *vardata, FmgrInfo *opproc,
Datum constval, Oid atttype, bool varonleft,
double *sumcommonp, int startslot)
{
[...]
}
static double
ineq_histogram_selectivity_mobdb(PlannerInfo *root, VariableStatData *
vardata,
FmgrInfo *opproc, bool isgt, bool iseq, Datum constval,
Oid consttype, int startslot)
{
[...]
}

in addition to copying other functions needed by the above functions since
they are not exported (defined as static)

static bool
get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
Oid sortop, Datum *min, Datum *max)

static bool
get_actual_variable_endpoint(Relation heapRel,
Relation indexRel, ScanDirection indexscandir,
ScanKey scankeys, int16 typLen,
bool typByVal, MemoryContext outercontext,
Datum *endpointDatum)

[...]

Is there a better way to do this ?

Is there any chance that the API for accessing the typanalyze and
selectivity functions will be enhanced in a future release ?

Regards

Esteban

--
------------------------------------------------------------
Prof. Esteban Zimanyi
Department of Computer & Decision Engineering (CoDE) CP 165/15
Universite Libre de Bruxelles
Avenue F. D. Roosevelt 50
B-1050 Brussels, Belgium
fax: + 32.2.650.47.13
tel: + 32.2.650.31.85
e-mail: ezimanyi(at)ulb(dot)ac(dot)be
Internet: http://code.ulb.ac.be/
------------------------------------------------------------

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-09-05 09:56:29 Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?
Previous Message Dilip Kumar 2019-09-05 09:33:36 Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?