Re: multivariate statistics v14

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: jeff(dot)janes(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multivariate statistics v14
Date: 2016-03-22 09:44:14
Message-ID: 89341a68-4729-ad28-bb39-cef31849aedb@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

On 03/22/2016 09:13 AM, Tatsuo Ishii wrote:
>>> Do you have any other missing parts in this work? I am asking
>>> because I wonder if you want to push this into 9.6 or rather 9.7.
>>
>> I think the first few parts of the patch series, namely:
>>
>> * shared infrastructure (0002)
>> * functional dependencies (0003)
>> * MCV lists (0004)
>> * histograms (0005)
>>
>> might make it into 9.6. I believe the code for building and storing
>> the different kinds of stats is reasonably solid. What probably needs
>> more thorough review are the changes in clauselist_selectivity(), but
>> the code in these parts is reasonably simple as it only supports using
>> a single multi-variate statistics per relation.
>>
>> The part (0006) that allows using multiple statistics (i.e. selects
>> which of the available stats to use and in what order) is probably the
>> most complex part of the whole patch, and I myself do have some
>> questions about some aspects of it. I don't think this part might get
>> into 9.6 at this point (although it'd be nice if we managed to do
>> that).
>
> Hum. So without 0006 or beyond, there's not much benefit for the
> PostgreSQL users, and you are not too confident about 0006 or
> beyond. Then I would think it is a little bit hard to justify in
> putting 000[2-5] into 9.6. I really like this feature and would like
> to see in PostgreSQL someday, but I'm not sure if we should put the
> patches (0002-0005) into PostgreSQL now. Please let me know if there's
> some reaons we should put the patches into PostgreSQL now.

I don't think so. While being able to combine multiple statistics is
certainly useful, I'm convinced that the initial patched add enough
value on their own, even if the 0006 patch gets committed later.

A lot of queries will be just fine with the "single multivariate
statistics" limitation, either because it's using less than 8 columns,
or because only 8 columns are actually correlated. (FWIW the 8 column
limit is mostly arbitrary, it may get increased if needed.)

I haven't really mentioned the aspects of 0006 that I think need more
discussion, but it's mostly about the question whether combining the
statistics by using the overlapping clauses as "conditions" is the right
thing to do (or whether a more expensive approach is needed). None of
that however invalidates the preceding patches.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-03-22 09:44:54 Re: checkpointer continuous flushing
Previous Message Yury Zhuravlev 2016-03-22 09:41:43 NOT EXIST for PREPARE