Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Mark Dilger <hornschnorter(at)gmail(dot)com>
Cc: Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2017-11-26 02:20:26
Message-ID: 933aab0d-fd36-9494-e0eb-79ebc4f57ef5@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/26/2017 02:17 AM, Mark Dilger wrote:
>
>> On Nov 25, 2017, at 3:33 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>>
>>
>> On 11/25/2017 10:01 PM, Mark Dilger wrote:
>>>
>>>> On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Attached is an updated version of the patch, adopting the psql describe
>>>> changes introduced by 471d55859c11b.
>>>
>>> Hi Tomas,
>>>
>>> In src/backend/statistics/dependencies.c, you have introduced a comment:
>>>
>>> + /*
>>> + * build an array of SortItem(s) sorted using the multi-sort support
>>> + *
>>> + * XXX This relies on all stats entries pointing to the same tuple
>>> + * descriptor. Not sure if that might not be the case.
>>> + */
>>>
>>> Would you mind explaining that a bit more for me? I don't understand exactly what
>>> you mean here, but it sounds like the sort of thing that needs to be clarified/fixed
>>> before it can be committed. Am I misunderstanding this?
>>>
>>
>> The call right after that comment is
>>
>> items = build_sorted_items(numrows, rows, stats[0]->tupDesc,
>> mss, k, attnums_dep);
>>
>> That method processes an array of tuples, and the structure is defined
>> by "tuple descriptor" (essentially a list of attribute info - data type,
>> length, ...). We get that from stats[0] and assume all the entries point
>> to the same tuple descriptor. That's generally safe assumption, I think,
>> because all the stats entries relate to columns from the same table.
>
> Right, I got that, and tried mocking up some code to test that in an Assert.
> I did not pursue that far enough to reach any conclusion, however. You
> seem to be indicating in the comment some uncertainty about whether the
> assumption is safe. Do we need to dig into that further?
>

I don't think it's worth the effort, really. I don't think we can really
get mismatching tuple descriptors here - that could only happen with
columns coming from different tables, or something similarly obscure.

>>>
>>> In src/backend/statistics/mcv.c, you have comments:
>>>
>>> + * FIXME: Single-dimensional MCV is sorted by frequency (descending). We
>>> + * should do that too, because when walking through the list we want to
>>> + * check the most frequent items first.
>>> + *
>>> + * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4).
>>> + * Maybe we could save some space here, but the bytea compression should
>>> + * handle it just fine.
>>> + *
>>> + * TODO: This probably should not use the ndistinct directly (as computed from
>>> + * the table, but rather estimate the number of distinct values in the
>>> + * table), no?
>>>
>>> Do you intend these to be fixed/implemented prior to committing this patch?
>>>
>>
>> Actually, the first FIXME is obsolete, as build_distinct_groups returns
>> the groups sorted by frequency. I'll remove that.
>
> Ok, good. That's the one I understood least.
>
>> I think the rest is more a subject for discussion, so I'd need to hear
>> some feedback.
>
> In terms of storage efficiency, you are using float8 for the frequency, which is consistent
> with what other stats work uses, but may be overkill. A float4 seems sufficient to me.
> The extra four bytes for a float8 may be pretty small compared to the size of the arrays
> being stored, so I'm not sure it matters. Also, this might have been discussed before,
> and I am not asking for a reversal of decisions the members of this mailing list may
> already have reached.
>
> As for using arrays of something smaller than Datum, you'd need some logic to specify
> what the size is in each instance, and that probably complicates the code rather a lot.
> Maybe someone else has a technique for doing that cleanly?
>

Note that this is not about storage efficiency. The comment is before
statext_mcv_build, so it's actually related to in-memory representation.
If you look into statext_mcv_serialize, it does use typlen to only copy
the number of bytes needed for each column.

>>>
>>> Further down in function statext_mcv_build, you have two loops, the first allocating
>>> memory and the second initializing the memory. There is no clear reason why this
>>> must be done in two loops. I tried combining the two loops into one, and it worked
>>> just fine, but did not look any cleaner to me. Feel free to disregard this paragraph
>>> if you like it better the way you currently have it organized.
>>>
>>
>> I did it this way because of readability. I don't think this is a major
>> efficiency issue, as the maximum number of items is fairly limited, and
>> it happens only once at the end of the MCV list build (and the sorts and
>> comparisons are likely much more CPU expensive).
>
> I defer to your judgement here. It seems fine the way you did it.
>
>>> Further down in statext_mcv_deserialize, you have some elogs which might need to be
>>> ereports. It is unclear to me whether you consider these deserialize error cases to be
>>> "can't happen" type errors. If so, you might add that fact to the comments rather than
>>> changing the elogs to ereports.
>>>
>>
>> I might be missing something, but why would ereport be more appropriate
>> than elog? Ultimately, there's not much difference between elog(ERROR)
>> and ereport(ERROR) - both will cause a failure.
>
> I understand project policy to allow elog for error conditions that will be reported
> in "can't happen" type situations, similar to how an Assert would be used. For
> conditions that can happen through (mis)use by the user, ereport is appropriate.
> Not knowing whether you thought these elogs were reporting conditions that a
> user could cause, I did not know if you should change them to ereports, or if you
> should just add a brief comment along the lines of /* should not be possible */.
>
> I may misunderstand project policy. If so, I'd gratefully accept correction on this
> matter.
>

I don't know - I always considered "elog" old interface, and "ereport"
is the new one. In any case, those are "should not happen" cases. It
would mean some sort of data corruption, or so.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-11-26 02:35:35 Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message Mark Dilger 2017-11-26 02:07:14 Re: Code cleanup patch submission for extended_stats.c