Re: More stable query plans via more predictable column statistics

From: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
To: Joel Jacobson <joel(at)trustly(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Stefan Litsche <stefan(dot)litsche(at)zalando(dot)de>
Subject: Re: More stable query plans via more predictable column statistics
Date: 2016-03-08 18:25:06
Message-ID: CACACo5RP3aO-vQxB+10-iGJiEGgPeHPyugDQbLcRdBOaxzmEZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 8, 2016 at 3:36 PM, Joel Jacobson <joel(at)trustly(dot)com> wrote:

> Hi Alex,
>
> Thanks for excellent research.
>

Joel,

Thank you for spending your time to run these :-)

I've ran your queries against Trustly's production database and I can
> confirm your findings, the results are similar:
>
> WITH ...
> SELECT count(1),
> min(hist_ratio)::real,
> avg(hist_ratio)::real,
> max(hist_ratio)::real,
> stddev(hist_ratio)::real
> FROM stats2
> WHERE histogram_bounds IS NOT NULL;
>
> -[ RECORD 1 ]----
> count | 2814
> min | 0.193548
> avg | 0.927357
> max | 1
> stddev | 0.164134
>
>
> WHERE distinct_hist < num_hist
> -[ RECORD 1 ]----
> count | 624
> min | 0.193548
> avg | 0.672407
> max | 0.990099
> stddev | 0.194901
>
>
> WITH ..
> SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited
> WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
> n_distinct, null_frac,
> num_mcv, most_common_vals, most_common_freqs,
> mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
> distinct_hist, num_hist, hist_ratio,
> histogram_bounds
> FROM stats2
> ORDER BY hist_ratio
> LIMIT 1;
>
> -[ RECORD 1
> ]-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> columnname | public.x.y
> n_distinct | 103
> null_frac | 0
> num_mcv | 10
> most_common_vals | {0,1,2,3,4,5,6,7,8,9}
> most_common_freqs |
>
> {0.4765,0.141733,0.1073,0.0830667,0.0559667,0.0373333,0.0251,0.0188,0.0141,0.0113667}
> mcv_frac | 0.971267
> nonnull_mcv_frac | 0.971267
> distinct_hist | 18
> num_hist | 93
> hist_ratio | 0.193548387096774
> histogram_bounds |
>
> {10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285}
>

I don't want to be asking for too much here, but is there a chance you
could try the effects of the proposed patch on an offline copy of your
database?

Do you envision or maybe have experienced problems with query plans
referring to the columns that are near the top of the above hist_ratio
report? In other words: what are the practical implications for you with
the values being duplicated rather badly throughout the histogram like in
the example you shown?

Thank you!
--
Alex

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Feld, Michael (IMS) 2016-03-08 18:27:06 pg_upgrade error regarding hstore operator
Previous Message Robert Haas 2016-03-08 18:21:05 Re: Freeze avoidance of very large table.