Quick Links

Re: MCV lists for highly skewed distributions

From:	John Naylor <jcnaylor(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: MCV lists for highly skewed distributions
Date:	2018-01-19 12:33:05
Message-ID:	CAJVSVGXVyLRmHS5su34d1gqGj5NMh0Ga+3VkTOTqKY9HUabcfQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

>> [1] Occasionally it will store a much longer MCV list, because no values
>> was
>> sampled exactly once, which triggers a different code path in which all
>> seen
>> values are put in the MCV and the histogram is NULL. This is not
>> reliable,
>> as whether the least-sample value is present in the sample once or twice
>> is
>> pretty brittle.
>
> And we need a better discussion of risk: Before we generated too few
> MCV entried. To what extent might me now generate too many? Which
> would be a problem in increased planning time.

(My apologies, I just now found some time to test this further, but I
don't have additional results yet)

Simon,
Earlier, I referenced a thread [1] complaining that we currently
already have too many MCVs in the case of uniform distributions, with
worse consequences than planning time. Based on my (admittedly quick
and dirty) preliminary testing (see attachment from a couple weeks
ago), this patch exacerbates that problem, and I was hoping to find a
way to fix that.

> I have a slight reservaton about whether 1.25x is still a sensible
> heuristic.

This was also discussed in [1], but no patch came out of it. I was
just now turning the formulas discussed there into code, but I'll
defer to someone with more expertise. FWIW, I suspect that a solution
that doesn't take into account a metric like coefficient of variation
will have the wrong behavior sometimes, whether for highly uniform or
highly non-uniform distributions.

[1] https://www.postgresql.org/message-id/flat/32261(dot)1496611829%40sss(dot)pgh(dot)pa(dot)us#32261(dot)1496611829(at)sss(dot)pgh(dot)pa(dot)us

-John Naylor

In response to

Re: MCV lists for highly skewed distributions at 2018-01-19 07:42:38 from Simon Riggs

Responses

Re: MCV lists for highly skewed distributions at 2018-01-20 05:00:15 from John Naylor
Re: MCV lists for highly skewed distributions at 2018-01-21 07:26:55 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2018-01-19 12:52:26	Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message	Christoph Berg	2018-01-19 12:23:24	Re: Package version in PG_VERSION and version()