Re: WIP: cross column correlation ...

From: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: WIP: cross column correlation ...
Date: 2011-02-26 18:12:42
Message-ID: 45B07768-8DD5-4FC9-A82D-54E3CD40EFA6@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>
>>
>> Still, having more data a user can probe would be nice.
>>
>> I wonder why everyone avoids Microsoft's approach to the subject. Apparently, they go in the 'auto-tune as much as possible' direction.
>> And tests we did a while ago, involving asking team from Microsoft and a team from oracle to optimise set of queries for the same set of data (bookies data, loads of it) showed that the auto-tuning Microsoft has in their
>> sql server performed much better than a team of over-sweating oracle dba's.
>
> I don't think *anyone* is avoiding that approach. There is almost
> universal consensus here that auto-tuning is better than manual
> tuning, even to the extent of being unwilling to add knobs to allow
> manual tuning of settings we have no idea how to auto-tune and no
> plans to auto-tune.
>
>> In my current work place/camp we have many deployments of the same system, over different types of machines, each with different customer data that vary so much that queries need to be rather generic.
>> Postgresql shows its strength with planner doing a good job for different variants of data, however we do a very little tweaking to the configuration parameters. Just because it is just too hard to overlook all of them.
>> I guess that the systems could behave much better, but no one is going to tweak settings for 50 different installations over 50 different type of data and 50 different sets of hardware.
>> If there was even a tiny amount of automation provided in the postgresql, I would welcome it with open arms.
>
> What do you have in mind?
>

what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on.
the planner code then goes through its restrictions in the query and finds the best / longest combination it can find and which has some statistics defined.
it seems we can also do this for join selectivity and expressions. the planner code for "raw column correlation" without expression ( cos(id) or so) and joins is there (WIP, no ANALYZE support and so on so far).

i think auto tuning is a good thing to have and the door to actually do it is wide open with our approach.
all it takes is a mechanism to see which "conditions" are used how often and somebody could write a job which automatically tells the system which stats to collect / sample.
i think for an "average" user this is the most simplistic thing then. but, to get there we have to get the bloody sampling and the rest of the planner code right in the first place.
auto tuning in this area is still something which is far in the future - but at least the road to it is clear.

some people suggested some approach dealing with effective_cache_size and so on ... there are many good approaches here but they don't address the actual problem of wrong size-estimates.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2011-02-26 18:13:24 Re: pl/python do not delete function arguments
Previous Message Bruce Momjian 2011-02-26 17:49:28 Re: TODO: You can alter it, but you can't view it