Re: More stable query plans via more predictable column statistics

From: Joel Jacobson <joel(at)trustly(dot)com>
To: "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Stefan Litsche <stefan(dot)litsche(at)zalando(dot)de>
Subject: Re: More stable query plans via more predictable column statistics
Date: 2016-03-08 14:36:01
Message-ID: CAASwCXdX4yneNF-J_=7yeQb+YbWGtGSp=LX4nMFLkV9jxPoyHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alex,

Thanks for excellent research.

I've ran your queries against Trustly's production database and I can
confirm your findings, the results are similar:

WITH ...
SELECT count(1),
min(hist_ratio)::real,
avg(hist_ratio)::real,
max(hist_ratio)::real,
stddev(hist_ratio)::real
FROM stats2
WHERE histogram_bounds IS NOT NULL;

-[ RECORD 1 ]----
count | 2814
min | 0.193548
avg | 0.927357
max | 1
stddev | 0.164134

WHERE distinct_hist < num_hist
-[ RECORD 1 ]----
count | 624
min | 0.193548
avg | 0.672407
max | 0.990099
stddev | 0.194901

WITH ..
SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited
WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
n_distinct, null_frac,
num_mcv, most_common_vals, most_common_freqs,
mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
distinct_hist, num_hist, hist_ratio,
histogram_bounds
FROM stats2
ORDER BY hist_ratio
LIMIT 1;

-[ RECORD 1 ]-----+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
columnname | public.x.y
n_distinct | 103
null_frac | 0
num_mcv | 10
most_common_vals | {0,1,2,3,4,5,6,7,8,9}
most_common_freqs |
{0.4765,0.141733,0.1073,0.0830667,0.0559667,0.0373333,0.0251,0.0188,0.0141,0.0113667}
mcv_frac | 0.971267
nonnull_mcv_frac | 0.971267
distinct_hist | 18
num_hist | 93
hist_ratio | 0.193548387096774
histogram_bounds |
{10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285}

On Mon, Jan 18, 2016 at 4:46 PM, Shulgin, Oleksandr
<oleksandr(dot)shulgin(at)zalando(dot)de> wrote:
> On Wed, Dec 2, 2015 at 10:20 AM, Shulgin, Oleksandr
> <oleksandr(dot)shulgin(at)zalando(dot)de> wrote:
>>
>> On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>> "Shulgin, Oleksandr" <oleksandr(dot)shulgin(at)zalando(dot)de> writes:
>>> > This post summarizes a few weeks of research of ANALYZE statistics
>>> > distribution on one of our bigger production databases with some
>>> > real-world
>>> > data and proposes a patch to rectify some of the oddities observed.
>>>
>>> Please add this to the 2016-01 commitfest ...
>>
>>
>> Added: https://commitfest.postgresql.org/8/434/
>
>
> It would be great if some folks could find a moment to run the queries I was
> showing on their data to confirm (or refute) my findings, or to contribute
> to the picture in general.
>
> As I was saying, the queries were designed in such a way that even
> unprivileged user can run them (the results will be limited to the stats
> data available to that user, obviously; and for custom-tailored statstarget
> one still needs superuser to join the pg_statistic table directly). Also,
> on the scale of ~30k attribute statistics records, the queries take only a
> few seconds to finish.
>
> Cheers!
> --
> Alex
>

--
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adrian Klaver 2016-03-08 14:43:37 Re: Exclude pg_largeobject form pg_dump
Previous Message Robert Haas 2016-03-08 13:59:04 Re: Freeze avoidance of very large table.