Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Mark Dilger <hornschnorter(at)gmail(dot)com>
Cc: Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2017-11-25 23:18:11
Message-ID: 4fc51dca-c165-167b-6468-a668552a4a93@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/25/2017 09:23 PM, Mark Dilger wrote:
>
>> On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> Hi,
>>
>> Attached is an updated version of the patch, adopting the psql describe
>> changes introduced by 471d55859c11b.
>>
>> regards
>>
>> --
>> Tomas Vondra http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> <0001-multivariate-MCV-lists.patch.gz><0002-multivariate-histograms.patch.gz>
>
> Hello Tomas,
>
> After applying both your patches, I get a warning:
>
> histogram.c:1284:10: warning: taking the absolute value of unsigned type 'uint32' (aka 'unsigned int') has no effect [-Wabsolute-value]
> delta = fabs(data->numrows);
> ^
> histogram.c:1284:10: note: remove the call to 'fabs' since unsigned values cannot be negative
> delta = fabs(data->numrows);
> ^~~~
> 1 warning generated.
>

Hmm, yeah. The fabs() call is unnecessary, and probably a remnant from
some previous version where the field was not uint32.

I wonder why you're getting the warning and I don't, though. What
compiler are you using?

>
> Looking closer at this section, there is some odd integer vs. floating point arithmetic happening
> that is not necessarily wrong, but might be needlessly inefficient:
>
> delta = fabs(data->numrows);
> split_value = values[0].value;
>
> for (i = 1; i < data->numrows; i++)
> {
> if (values[i].value != values[i - 1].value)
> {
> /* are we closer to splitting the bucket in half? */
> if (fabs(i - data->numrows / 2.0) < delta)
> {
> /* let's assume we'll use this value for the split */
> split_value = values[i].value;
> delta = fabs(i - data->numrows / 2.0);
> nrows = i;
> }
> }
> }
>
> I'm not sure the compiler will be able to optimize out the recomputation of data->numrows / 2.0
> each time through the loop, since the compiler might not be able to prove to itself that data->numrows
> does not get changed. Perhaps you should compute it just once prior to entering the outer loop,
> store it in a variable of integer type, round 'delta' off and store in an integer, and do integer comparisons
> within the loop? Just a thought....
>

Yeah, that's probably right. But I wonder if the loop is needed at all,
or whether we should start at i=(data->numrows/2.0) instead, and walk to
the closest change of value in both directions. That would probably save
more CPU than computing numrows/2.0 only once.

The other issue in that block of code seems to be that we compare the
values using simple inequality. That probably works for passbyval data
types, but we should use proper comparator (e.g. compare_datums_simple).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2017-11-25 23:33:46 Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Previous Message Tom Lane 2017-11-25 22:05:39 Re: Code cleanup patch submission for extended_stats.c