Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-01-22 23:46:07
Message-ID: CAKJS1f-YHJWaTx8pharjcU15j4btuGt1ZCXJmgi8zkALRK96RQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 23 Jan 2019 at 03:43, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> I made another pass over the 0001 patch. I've not read through mcv.c
> again yet. Will try to get to that soon.
>
> 0001-multivariate-MCV-lists-20190117.patch

I started on mcv.c this morning. I'm still trying to build myself a
picture of how it works, but I have noted a few more things while I'm
reading.

24. These macros are still missing parenthesis around the arguments:

#define ITEM_INDEXES(item) ((uint16*)item)
#define ITEM_NULLS(item,ndims) ((bool*)(ITEM_INDEXES(item) + ndims))
#define ITEM_FREQUENCY(item,ndims) ((double*)(ITEM_NULLS(item,ndims) + ndims))

While I don't see any reason to put parenthesis around the macro's
argument when passing it to another macro, since it should do it...
There is a good reason to have the additional parenthesis when it's
not passed to another macro.

Also, there's a number of places, including with these macros that
white space is not confirming to project standard. e.g.
((uint16*)item) should be ((uint16 *) (item)) (including fixing the
missing parenthesis)

25. In statext_mcv_build() I'm trying to figure out what the for loop
does below the comment:

* If we can fit all the items onto the MCV list, do that. Otherwise
* use get_mincount_for_mcv_list to decide which items to keep in the
* MCV list, based on the number of occurences in the sample.

The comment explains only as far as the get_mincount_for_mcv_list()
call so the following is completely undocumented:

for (i = 0; i < nitems; i++)
{
if (mcv_counts[i] < mincount)
{
nitems = i;
break;
}
}

I was attempting to figure out if the break should be there, or if the
code should continue and find the 'i' for the smallest mcv_counts, but
I don't really understand what the code is meant to be doing.

Also: occurences -> occurrences

26. Again statext_mcv_build() I'm a bit puzzled to why mcv_counts
needs to exist at all. It's built from:

mcv_counts = (int *) palloc(sizeof(int) * nitems);

for (i = 0; i < nitems; i++)
mcv_counts[i] = groups[i].count;

Then only used in the loop mentioned in #25 above. Can't you just use
groups[i].count?

(Stopped in statext_mcv_build(). Need to take a break)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-01-23 00:15:40 Re: Typo: llvm*.cpp files identified as llvm*.c
Previous Message Bossart, Nathan 2019-01-22 23:21:32 Re: A few new options for vacuumdb