Re: Choosing values for multivariate MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Choosing values for multivariate MCV lists
Date: 2019-06-23 23:42:32
Message-ID: 20190623234232.xkydftfd7uxexelo@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 23, 2019 at 10:23:19PM +0200, Tomas Vondra wrote:
>On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed wrote:
>>On Sat, 22 Jun 2019 at 15:10, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>One annoying thing I noticed is that the base_frequency tends to end up
>>>being 0, most likely due to getting too small. It's a bit strange, though,
>>>because with statistic target set to 10k the smallest frequency for a
>>>single column is 1/3e6, so for 2 columns it'd be ~1/9e12 (which I think is
>>>something the float8 can represent).
>>>
>>
>>Yeah, it should be impossible for the base frequency to underflow to
>>0. However, it looks like the problem is with mcv_list_items()'s use
>>of %f to convert to text, which is pretty ugly.
>>
>
>Yeah, I realized that too, eventually. One way to fix that would be
>adding %.15f to the sprintf() call, but that just adds ugliness. It's
>probably time to rewrite the function to build the tuple from datums,
>instead of relying on BuildTupleFromCStrings.
>

OK, attached is a patch doing this. It's pretty simple, and it does
resolve the issue with frequency precision.

There's one issue with the signature, though - currently the function
returns null flags as bool array, but values are returned as simple
text value (formatted in array-like way, but still just a text).

In the attached patch I've reworked both to proper arrays, but obviously
that'd require a CATVERSION bump - and there's not much apetite for that
past beta2, I suppose. So I'll just undo this bit.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-06-23 23:54:55 Re: Choosing values for multivariate MCV lists
Previous Message Andrew Dunstan 2019-06-23 22:15:06 Plugging some testing holes