Re: MCV lists for highly skewed distributions

From: John Naylor <jcnaylor(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MCV lists for highly skewed distributions
Date: 2018-01-21 07:26:55
Message-ID: CAJVSVGXD+2b+i_fjsa_dkGWN8CkuKaBbfF1-01H=W2GP-B0THQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:

>> I have a slight reservaton about whether 1.25x is still a sensible
>> heuristic.
>
> This was also discussed in [1], but no patch came out of it. I was
> just now turning the formulas discussed there into code, but I'll
> defer to someone with more expertise. FWIW, I suspect that a solution
> that doesn't take into account a metric like coefficient of variation
> will have the wrong behavior sometimes, whether for highly uniform or
> highly non-uniform distributions.

I spent a few hours hacking on this, and it turns out calculating the
right number of MCVs taking into account both uniform and highly
non-uniform distributions is too delicate a problem for me to solve
right now. The logic suggested by Dean Rasheed in [1] always produces
no MCVs for a perfectly uniform distribution (which is good), but very
often also for other distributions, which is not good. My efforts to
tweak that didn't work, so I didn't get as far as adapting it for the
problem Jeff is trying to solve.

I have not been able to come up with a more compelling alternative, so
I have nothing further to say about the patch under review.

> [1]
> https://www.postgresql.org/message-id/flat/32261(dot)1496611829%40sss(dot)pgh(dot)pa(dot)us#32261(dot)1496611829(at)sss(dot)pgh(dot)pa(dot)us

-John Naylor

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Ullrich 2018-01-21 10:34:07 Re: pl/perl extension fails on Windows
Previous Message Tom Lane 2018-01-21 06:43:18 Re: [HACKERS] UPDATE of partition key