Quick Links

Re: Choosing values for multivariate MCV lists

From:	Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Choosing values for multivariate MCV lists
Date:	2019-06-21 07:50:33
Message-ID:	CAEZATCWqqB3R+Oewq2u_ByS01-12+M4=yRARVvdmx4R0ZO-RvQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, 20 Jun 2019 at 23:35, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On Thu, Jun 20, 2019 at 06:55:41AM +0100, Dean Rasheed wrote:
>
> >I'm not sure it's easy to justify ordering by Abs(freq-base_freq)/freq
> >though, because that would seem likely to put too much weight on the
> >least commonly occurring values.
>
> But would that be an issue, or a good thing? I mean, as long as the item
> is above mincount, we take the counts as reliable. As I explained, my
> motivation for proposing that was that both
>
> ... (cost=... rows=1 ...) (actual=... rows=1000001 ...)
>
> and
>
> ... (cost=... rows=1000000 ...) (actual=... rows=2000000 ...)
>
> have exactly the same Abs(freq - base_freq), but I think we both agree
> that the first misestimate is much more dangerous, because it's off by six
> orders of magnitude.
>

Hmm, that's a good example. That definitely suggests that we should be
trying to minimise the relative error, but also perhaps that what we
should be looking at is actually just the ratio freq / base_freq,
rather than their difference.

Regards,
Dean

In response to

Re: Choosing values for multivariate MCV lists at 2019-06-20 22:35:48 from Tomas Vondra

Responses

Re: Choosing values for multivariate MCV lists at 2019-06-22 14:10:52 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2019-06-21 09:12:38	allow_system_table_mods stuff
Previous Message	John Naylor	2019-06-21 07:36:48	Re: benchmarking Flex practices