Re: Choosing values for multivariate MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Choosing values for multivariate MCV lists
Date: 2019-06-25 09:18:19
Message-ID: 20190625091819.uslpopydrkqm3sdf@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 24, 2019 at 02:54:01PM +0100, Dean Rasheed wrote:
>On Mon, 24 Jun 2019 at 00:42, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Sun, Jun 23, 2019 at 10:23:19PM +0200, Tomas Vondra wrote:
>> >On Sun, Jun 23, 2019 at 08:48:26PM +0100, Dean Rasheed wrote:
>> >>On Sat, 22 Jun 2019 at 15:10, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> >>>One annoying thing I noticed is that the base_frequency tends to end up
>> >>>being 0, most likely due to getting too small. It's a bit strange, though,
>> >>>because with statistic target set to 10k the smallest frequency for a
>> >>>single column is 1/3e6, so for 2 columns it'd be ~1/9e12 (which I think is
>> >>>something the float8 can represent).
>> >>>
>> >>
>> >>Yeah, it should be impossible for the base frequency to underflow to
>> >>0. However, it looks like the problem is with mcv_list_items()'s use
>> >>of %f to convert to text, which is pretty ugly.
>> >>
>> >
>> >Yeah, I realized that too, eventually. One way to fix that would be
>> >adding %.15f to the sprintf() call, but that just adds ugliness. It's
>> >probably time to rewrite the function to build the tuple from datums,
>> >instead of relying on BuildTupleFromCStrings.
>> >
>>
>> OK, attached is a patch doing this. It's pretty simple, and it does
>> resolve the issue with frequency precision.
>>
>> There's one issue with the signature, though - currently the function
>> returns null flags as bool array, but values are returned as simple
>> text value (formatted in array-like way, but still just a text).
>>
>> In the attached patch I've reworked both to proper arrays, but obviously
>> that'd require a CATVERSION bump - and there's not much apetite for that
>> past beta2, I suppose. So I'll just undo this bit.
>>
>
>Hmm, I didn't spot that the old code was using a single text value
>rather than a text array. That's clearly broken, especially since it
>wasn't even necessarily constructing a valid textual representation of
>an array (e.g., if an individual value's textual representation
>included the array markers "{" or "}").
>
>IMO fixing this to return a text array is worth doing, even though it
>means a catversion bump.
>

Yeah :-(

It used to be just a "debugging" function, but now that we're using it
e.g. in pg_stats_ext definition, we need to be more careful about the
output. Presumably we could keep the text output and make sure it's
escaped properly etc. We could even build an array internally and then
run it through an output function. That'd not require catversion bump.

I'll cleanup the patch changing the function signature. If others think
the catversion bump would be too significant annoyance at this point, I
will switch back to the text output (with proper formatting).

Opinions?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2019-06-25 09:38:43 Re: GiST VACUUM
Previous Message Sergei Kornilov 2019-06-25 08:45:39 Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions