Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2017-11-25 21:01:06
Message-ID: B3AF05D4-A312-4988-BC5F-256E756C43CB@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Nov 18, 2017, at 12:28 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> Attached is an updated version of the patch, adopting the psql describe
> changes introduced by 471d55859c11b.

Hi Tomas,

In src/backend/statistics/dependencies.c, you have introduced a comment:

+ /*
+ * build an array of SortItem(s) sorted using the multi-sort support
+ *
+ * XXX This relies on all stats entries pointing to the same tuple
+ * descriptor. Not sure if that might not be the case.
+ */

Would you mind explaining that a bit more for me? I don't understand exactly what
you mean here, but it sounds like the sort of thing that needs to be clarified/fixed
before it can be committed. Am I misunderstanding this?

In src/backend/statistics/mcv.c, you have comments:

+ * FIXME: Single-dimensional MCV is sorted by frequency (descending). We
+ * should do that too, because when walking through the list we want to
+ * check the most frequent items first.
+ *
+ * TODO: We're using Datum (8B), even for data types (e.g. int4 or float4).
+ * Maybe we could save some space here, but the bytea compression should
+ * handle it just fine.
+ *
+ * TODO: This probably should not use the ndistinct directly (as computed from
+ * the table, but rather estimate the number of distinct values in the
+ * table), no?

Do you intend these to be fixed/implemented prior to committing this patch?

Further down in function statext_mcv_build, you have two loops, the first allocating
memory and the second initializing the memory. There is no clear reason why this
must be done in two loops. I tried combining the two loops into one, and it worked
just fine, but did not look any cleaner to me. Feel free to disregard this paragraph
if you like it better the way you currently have it organized.

Further down in statext_mcv_deserialize, you have some elogs which might need to be
ereports. It is unclear to me whether you consider these deserialize error cases to be
"can't happen" type errors. If so, you might add that fact to the comments rather than
changing the elogs to ereports.

mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-11-25 21:33:53 pgbench - add \if support
Previous Message Mark Dilger 2017-11-25 20:25:49 Re: [HACKERS] PATCH: multivariate histograms and MCV lists