Re: multivariate statistics v14

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)2ndquadrant(dot)com, petr(at)2ndquadrant(dot)com, jeff(dot)janes(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multivariate statistics v14
Date: 2016-03-28 08:42:28
Message-ID: 95089064-e388-2cd1-ab62-7c88890eaf67@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 03/26/2016 10:18 AM, Tatsuo Ishii wrote:
>> Fair point. Attached is v18 of the patch, after pgindent cleanup.
>
> Here are some feedbacks to v18 patch.
>
> 1) regarding examples in create_statistics manual
>
> Here are numbers I got. "with statistics" referrers to the case where
> multivariate statistics are used. "without statistics" referrers to the
> case where multivariate statistics are not used. The numbers denote
> estimated_rows/actual_rows. Thus closer to 1.0 is better. Some numbers
> are shown as a fraction to avoid 0 division. In my understanding case
> 1, 3, 4 showed that multivariate statistics superior.
>
> with statistics without statistics
> case1 0.98 0.01
> case2 98/0 1/0

The case2 shows that functional dependencies assume that the conditions
used in queries won't be incompatible - that's something this type of
statistics can't fix.

> case3 1.05 0.01
> case4 1/0 103/0
> case5 18.50 18.33
> case6 111123/0 1111123/0

The last two lines (case5 + case6) seem a bit suspicious. I believe
those are for the histogram data, and I do get these numbers:

case5 0.93 (5517 / 5949) 42.0 (249943 / 5949)
case6 100/0 100/0

Perhaps you've been using the version before the bugfix, with ANALYZE on
the wrong table?

>
> 2) following comments by me are not addressed in the v18 patch.
>
>> - There's no docs for pg_mv_statistic (should be added to "49. System
>> Catalogs")
>>
>> - The word "multivariate statistics" or something like that should
>> appear in the index.
>>
>> - There are some explanation how to deal with multivariate statistics
>> in "14.1 Using Explain" and "14.2 Statistics used by the Planner"
>> section.

Yes, those are valid omissions. I plan to address them, and I'd also
considering adding a section to 65.1 (How the Planner Uses Statistics),
explaining more thoroughly how the planner uses multivariate stats.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-03-28 08:49:10 Re: multivariate statistics v14
Previous Message Peter Geoghegan 2016-03-28 08:06:25 Re: Draft release notes for next week's releases