Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2018-03-10 17:19:07
Message-ID: 20180310171906.gd6vyz6hnou2n3h7@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 0002:

In terms of docs, I think it's better not to have anything user-facing
in the README. Consider that users are going to be reading the HTML
docs only, and many of them may not have the README available at all.
So anything that could be useful to users must be in the XML docs only;
keep in the README only stuff that would be useful to a developer (a
section such as "not yet implemented" would belong there, for example).
Stuff that's in the XML should not appear in the README (because DRY).
For the same reason, having the XML docs end with "see the README" seems
a bad idea to me.

UPDATE_RESULT() is a bit weird to me. I think after staring at it for a
while it looks okay, but why was it such a shock? In 0002 it's only
used in one place so I would suggest to have it expanded, but I see you
use it in 0003 also, three times I think. IMO for clarity it seems
better to just have the expanded code rather than the macro.

find_ext_attnums (and perhaps other places) have references to renamed
columns, "starelid" and others. Also there is this comment:
/* Prepare to scan pg_statistic_ext for entries having indrelid = this rel. */
which is outdated since it uses syscache, not a scan. Just remove the
comment ...

Please add a comment on what does build_attnums() do.

pg_stats_ext_mcvlist_items is odd. I suppose you made it take oid to
avoid having to deal with a malicious bytea? The query in docs is
pretty odd-looking,

SELECT * FROM pg_mcv_list_items((SELECT oid FROM pg_statistic_ext WHERE stxname = 'stts2'));
If we keep the function as is, I would suggest to use LATERAL instead,
SELECT m.* FROM pg_statistic_ext, pg_mcv_list_items(oid) m WHERE stxname = 'stts2';
but seems like it should be more like this instead:
SELECT m.* FROM pg_statistic_ext, pg_mcv_list_items(stxmcv) m WHERE stxname = 'stts2';
and not have the output formatting function load the data again from the
table. It'd be a bit like a type-specific UNNEST.

There are a few elog(ERROR) messages. The vast majority seem to be just
internal messages so they're okay, but there is one that should be
ereport:

+ if (total_length > (1024 * 1024))
+ elog(ERROR, "serialized MCV list exceeds 1MB (%ld)", total_length);
I think we have some precedent for better wording, such as
errmsg("index row size %zu exceeds maximum %zu for index \"%s\""
so I would say
errmsg("serialized MCV list size %zu exceedes maximum %zu" )
though I wonder when is this error thrown -- if this is detected during
analyze for example, what happens?

There is this FIXME:
+ * FIXME Should skip already estimated clauses (using the estimatedclauses
+ * bitmap).
Are you planning on implementing this before commit?

There are other FIXMEs also. This in particular caught my attention:

+ /* merge the bitmap into the existing one */
+ for (i = 0; i < mcvlist->nitems; i++)
+ {
+ /*
+ * Merge the result into the bitmap (Min for AND, Max for OR).
+ *
+ * FIXME this does not decrease the number of matches
+ */
+ UPDATE_RESULT(matches[i], or_matches[i], is_or);
+ }

We come back to UPDATE_RESULT again ... and note how the comment makes
no sense unless you know what UPDATE_RESULT does internally. This is
one more indication that the macro is not a great thing to have. Let's
lose it. But while at it, what to do about the FIXME?

You also have this
+ /* XXX syscache contains OIDs of deleted stats (not invalidated) */
+ if (!HeapTupleIsValid(htup))
+ return NULL;
but what does it mean? Is it here to cover for some unknown bug?
Should we maybe not have this at all?

Another XXX comment says
+ * XXX All the memory is allocated in a single chunk, so that the caller
+ * can simply pfree the return value to release all of it.

but I would say just remove the XXX and leave the rest of the comment.

There is another XXX comment that says "this is useless", and I agree.
Just take it all out ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-03-10 18:02:27 Re: FOR EACH ROW triggers on partitioned tables
Previous Message Alexander Korotkov 2018-03-10 17:05:25 Re: [HACKERS] [PATCH] Incremental sort