Re: WIP: multivariate statistics / proof of concept

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Katharina Büchse" <katharina(dot)buechse(at)uni-jena(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: multivariate statistics / proof of concept
Date: 2014-11-13 16:42:25
Message-ID: 8106e11197849725375a933e1cc1409f.squirrel@2.emaily.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dne 13 Listopad 2014, 16:51, Katharina Büchse napsal(a):
> On 13.11.2014 14:11, Tomas Vondra wrote:
>
>> The only place where I think this might work are the associative rules.
>> It's simple to specify rules like ("ZIP code" implies "city") and we
>> could
>> even do some simple check against the data to see if it actually makes
>> sense (and 'disable' the rule if not).
>
> and even this simple example has its limits, at least in Germany ZIP
> codes are not unique for rural areas, where several villages have the
> same ZIP code.
>
> I guess there are just a few examples where columns are completely
> functional dependent without any exceptions.
> But of course, if the user gives this information just for optimization
> the statistics, some exceptions don't matter.
> If this information should be used for creating different execution
> plans (e.g. on column A is an index and column B is functional
> dependent, one could think about using this index on A and the
> dependency instead of running through the whole table to find all tuples
> that fit the query on column B), exceptions are a very important issue.

Yes, exactly. The aim of this patch is "only" improving estimates, not
removing conditions from the plan (e.g. checking only the ZIP code and not
the city name). That certainly can't be done solely based on approximate
statistics, and as you point out most real-world data either contain bugs
or are inherently imperfect (we have the same kind of ZIP/city
inconsistencies in Czech). That's not a big issue for estimates (assuming
only small fraction of rows violates the rule) though.

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2014-11-13 16:50:20 Re: controlling psql's use of the pager a bit more
Previous Message Andrew Dunstan 2014-11-13 16:41:37 Re: controlling psql's use of the pager a bit more