Re: Additional Statistics Hooks

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Mat Arye <mat(at)timescale(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Additional Statistics Hooks
Date: 2018-03-15 14:29:38
Message-ID: b2d1b31f-92d1-5c23-8c2b-b3b9750e3160@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/15/2018 06:00 AM, Ashutosh Bapat wrote:
> On Tue, Mar 13, 2018 at 8:55 PM, Mat Arye <mat(at)timescale(dot)com> wrote:
>>>
>>> Like cost associated with a function, we may associate mapping
>>> cardinality with a function. It tells how many distinct input values
>>> map to 1 output value. By input value, I mean input argument tuple. In
>>> Mat's case the mapping cardinality will be 12. The number of distinct
>>> values that function may output is estimated as number of estimated
>>> rows / mapping cardinality of that function.
>>
>>
>> I think this is complicated by the fact that the mapping cardinality is not
>> a constant per function
>> but depends on the constant given as the first argument to the function and
>> the granularity of the
>> underlying data (do you have a second-granularity or microsecond
>> granularity). I actually think the logic for the
>> estimate here should be the (max(time)-min(time))/interval. I think to be
>> general you need to allow functions on statistics to determine the estimate.
>>
>
> I think my solution was quite short-sighted. You are right. We need a
> function taking statistics about the input argument as input and
> output the statistics about the output. The planner can then use this
> statistics to arrive at various estimates.
>

I think the best solution is to extend the CREATE STATISTICS so that it
handles things like

CREATE STATISTICS s ON date_trunc('day', column) FROM table

As David mentioned elsewhere in this thread, this was considered before
CREATE STATISTICS was introduced in PG11 and it's why the features is
called 'extended' and not 'multivariate'.

It would give us the same stats as we have for expression indexes, but
without the extra overhead.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ildar Musin 2018-03-15 14:31:50 Re: using index or check in ALTER TABLE SET NOT NULL
Previous Message Robert Haas 2018-03-15 14:16:48 Re: [HACKERS] Partition-wise aggregation/grouping