Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-23 23:41:25
Message-ID: fd4c5223-6c1a-c2d5-c0e0-53db789a0bdc@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/21/19 4:05 PM, David Rowley wrote:
> On Mon, 18 Mar 2019 at 02:18, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> Yes, it was using the toasted value directly. The attached patch
>> detoasts the value explicitly, similarly to the per-column stats, and it
>> also removes the 1MB limit.
>
> I just made a pass over 0001 and 0002.
>
> 0002 is starting to look pretty good, but I did note down a few things
> while looking. Some things below might just me being unclear on how
> something works. Perhaps that means more comments are needed, but it
> might also mean I need a brain upgrade. I'm hoping it's the former.
>

That's good to hear. Thanks for the review.

> 0001:
>
> 1. Could you write a full commit message for this patch. Without
> reading the backlog on this ticket it's not all that obvious what the
> patch aims to fix. (I have read the backlog, so I know, but the next
> person might not have)
>
> 2. Should all the relpages variables be BlockNumber rather than double?
>

Probably. But I think the conclusion from the discussion with Dean was
that tweaking the relpages/reltuples reset should really be a matter for
a separate patch. So I've removed it from this patch series and the
tests were modified to check the stats are still there.

> 0002:
>
> 3. I'm not sure what the following is trying to say:
>
> * Estimate selectivity on any clauses applicable by stats tracking
> * actual values first, then apply functional dependencies on the
> * remaining clauses.
>
> can you reword it?
>

It was supposed to say we first try to apply the more complicated stats
(those that track dependencies between values) before applying the
simpler ones that only track dependencies between columns. I've reworked
and simplified comments in this part of the code.

> 4. This seems out of date:
>
> * clauses that we've already estimated for. Each selectivity
> * function will set the appropriate bit in the bitmapset to mark that
> * no further estimation is required for that list item.
>
> We're only passing estimatedclauses to 1 function before
> clauselist_selectivity_simple is called for the remainder.
>

True. I've simplified/reworded this. The old wording was mostly a
residue of how this worked in previous patch versions.

> 5. In build_attnums_array there's
> Assert(AttrNumberIsForUserDefinedAttr(j)); I just wanted to point out
> that this could only possibly trigger of the bitmapset had a 0 member.
> It cannot have negative members. Maybe it would be worth adding a
> comment to acknowledge that as it looks a bit misguided otherwise.
>

Right. I've added an explanation, and another assert checking the
maximum value (because bitmaps store integers, but we only expect
attnums here).

> 6. In build_attnums_array(), what's the reason to return int *, rather
> than an AttrNumber * ? Likewise in the code that calls that function.
>

Laziness, I guess. Also, bitmaps work with int members, so it was kinda
natural. But you're right AttrNumber is a better choice, so fixed.

> 7. Not properly indented. Should be two tabs.
>
> * build sorted array of SortItem with values from rows
>
> Should also be "a sorted array"
>

Fixed.

> 8. This comment seems to duplicate what is just mentioned in the
> header comment for the function.
>
> /*
> * We won't allocate the arrays for each item independenly, but in one
> * large chunk and then just set the pointers. This allows the caller to
> * simply pfree the return value to release all the memory.
> */
>
> Also, typo "independenly" -> "independently"
>

Fixed. I've removed this comment, the function comment is enough.

> 9. Not properly indented:
>
> /*
> * statext_is_compatible_clause_internal
> * Does the heavy lifting of actually inspecting the clauses for
> * statext_is_compatible_clause. It needs to be split like this because
> * of recursion. The attnums bitmap is an input/output parameter collecting
> * attribute numbers from all compatible clauses (recursively).
> */
>

Fixed. It might be a tad too similar to statext_is_compatible_clause
comment, though.

> 10. Header comment for get_mincount_for_mcv_list() ends with
> *---------- but does not start with that.
>

Fixed.

> 11. In get_mincount_for_mcv_list() it's probably better to have the
> numerical literals of 0.0 instead of just 0.
>

Why?

> 12. I think it would be better if you modified build_attnums_array()
> to add an output argument that sets the size of the array. It seems
> that most places you call this function you perform bms_num_members()
> to determine the array size.
>

Hmmm. I've done this, but I'm not sure I like it very much - there's no
protection the value passed in is the right one, so the array might be
allocated either too small or too large. I think it might be better to
make it work the other way, i.e. pass the value out instead.

> 13. This comment seems to be having a fight with itself:
>
> * Preallocate Datum/isnull arrays (not as a single chunk, as we will
> * pass the result outside and thus it needs to be easy to pfree().
> *
> * XXX On second thought, we're the only ones dealing with MCV lists,
> * so we might allocate everything as a single chunk to reduce palloc
> * overhead (chunk headers, etc.) without significant risk. Not sure
> * it's worth it, though, as we're not re-building stats very often.
>

Yes, I've reworded/simplified the comment.

> 14. The following might be easier to read if you used a local variable
> instead of counts[dim].
>
> for (i = 0; i < mcvlist->nitems; i++)
> {
> /* skip NULL values - we don't need to deduplicate those */
> if (mcvlist->items[i]->isnull[dim])
> continue;
>
> values[dim][counts[dim]] = mcvlist->items[i]->values[dim];
> counts[dim] += 1;
> }
>
> Then just assign the value of the local variable to counts[dim] at the end.
>

I've tried that, but it didn't seem like an improvement so I've kept the
current code.

> 15. Why does this not use stats[dim]->attrcollid ?
>
> ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;
>

Hmmm, that's a good question. TBH I don't recall why I used the default
collation here, but I think it's mostly harmless because it's used only
during serialization. But I'll check, it seems suspicious.

But that made me revisit how collations are handled when building the
MCV list, and I see it's using type->typcollation, which I seems wrong
as the column might use a different collation.

But if this is wrong, it's already wrong in dependencies and mvdistinct
statistics ...

> 16. The following:
>
> else if (info[dim].typlen == -2) /* cstring */
> {
> info[dim].nbytes = 0;
> for (i = 0; i < info[dim].nvalues; i++)
> {
> values[dim][i] = PointerGetDatum(PG_DETOAST_DATUM(values[dim][i]));
> info[dim].nbytes += strlen(DatumGetCString(values[dim][i]));
> }
> }
>
> seems to conflict with:
>
> else if (info[dim].typlen == -2) /* cstring */
> {
> memcpy(data, DatumGetCString(v), strlen(DatumGetCString(v)) + 1);
> data += strlen(DatumGetCString(v)) + 1; /* terminator */
> }
>
> It looks like you'll reserve 1 byte too few for each cstring value.
>
> (Might also be nicer to assign the strlen to a local variable rather
> than leave it up to the compiler to optimize out the 2nd strlen call
> in the latter of the two code fragments above.)
>

Good catch! Fixed.

> 17. I wonder if some compilers will warn about this:
>
> ITEM_INDEXES(item)[dim] = (value - values[dim]);
>
> Probably a cast to uint16 might fix them if they do.
>

Possibly. I've added the explicit cast.

> 18. statext_mcv_deserialize: I don't think "Size" should have a
> capaital 'S' here:
>
> elog(ERROR, "invalid MCV Size %ld (expected at least %zu)",
> VARSIZE_ANY_EXHDR(data), offsetof(MCVList, items));
>
> Also, the following should likely use the same string to reduce the
> number of string constants:
>
> elog(ERROR, "invalid MCV size %ld (expected %zu)",
> VARSIZE_ANY_EXHDR(data), expected_size);
>

Yeah, it should have been "size". But I don't think reusing the same
string is a good idea, because those are two separate/different issues.

> 19. statext_mcv_deserialize: There seems to be a mix of ereports and
> elogs for "shouldn't happen" cases. Any reason to use ereport instead
> of elog for these?
>
> I also really wonder if you need so many different error messages. I
> imagine if anyone complains about hitting this case then we'd just be
> telling them to run ANALYZE again.
>

Yeah, it seems a bit of a mess. As those are really "should not happen"
issues, likely caused by some form of data corruption, I think we can
reduce it to fewer checks with one or two error messages.

> 20. Isn't this only needed for modules?
>
> PG_FUNCTION_INFO_V1(pg_stats_ext_mcvlist_items);
>

Yep, fixed.

> 21. Do you think it would be better to declare
> pg_stats_ext_mcvlist_items() to accept the oid of the pg_statistic_ext
> row rather than the stxmcv column? (However, I do see you have a mcv
> type, so perhaps you might want other types in the future?)
>

I don't think so, I don't see what advantages would it have.

> 22. I see lots of usages of DEFAULT_COLLATION_OID in
> mcv_get_match_bitmap. Can you add a comment to explain why that's
> okay? I imagined the collation should match the column's collation.
>

Yeah, same thing as above. Have to check.

> 23. Are these comments left over from a previous version?
>
> /* OR - was MATCH_NONE, but will be MATCH_FULL */
> /* AND - was MATC_FULL, but will be MATCH_NONE */
> /* if the clause mismatches the MCV item, set it as MATCH_NONE */
>

Fixed.

> 24. I think the following comment needs explained a bit better:
>
> /*
> * mcv_clauselist_selectivity
> * Return the selectivity estimate of clauses using MCV list.
> *
> * It also produces two interesting selectivities - total selectivity of
> * all the MCV items combined, and selectivity of the least frequent item
> * in the list.
> */
> Selectivity
> mcv_clauselist_selectivity(PlannerInfo *root, StatisticExtInfo *stat,
> List *clauses, int varRelid,
> JoinType jointype, SpecialJoinInfo *sjinfo,
> RelOptInfo *rel,
> Selectivity *basesel, Selectivity *totalsel)
>
> I see 3 possible selectivities. What's different with *totalsel and
> the return value of the function?
>
> (I can see from looking at the actual code that it's not, but I don't
> really know why it has to be different)
>

Well, it returns the selectivity estimate (matching the clauses), and
then two additional selectivities:

1) total - a sum of frequencies for all MCV items (essentially, what
fraction of data is covered by the MCV list), which is then used to
estimate the non-MCV part

2) base - a sum of base frequencies for matching items (which is used
for correction of the non-MCV part)

I'm not sure I quite understand what's unclear here.

> 25. In README.mcv, I don't quite understand this:
>
> TODO Currently there's no logic to consider building only an MCV list (and not
> building the histogram at all), except for doing this decision manually in
> ADD STATISTICS.
>
> Not sure why histograms are mentioned and also not sure what ADD STATISTICS is.
>

Yeah, that's obsolete. Removed.

> 26. I don't quite understand the "to defend against malicious input" part in.
>
> It accepts one parameter - a pg_mcv_list value (which can only be obtained
> from pg_statistic_ext catalog, to defend against malicious input), and
> returns these columns:
>
> It kinda sounds like there's some sort of magic going on to ensure the
> function can only be called using stxmcv, but it's just that it
> requires a pg_mcv_list type and that type has an input function that
> just errors out, so it could only possibly be set from C code.
>

Yeah, the idea is that if it was possible to supply arbitrary binary
data as a MCV list, someone could inject arbitrarily broken value. By
only allowing values from the catalog (which we must have built) that's
no longer an issue.

> 27. This looks like an unintended change:
>
> /*
> - * Get the numdistinct estimate for the Vars of this rel. We
> - * iteratively search for multivariate n-distinct with maximum number
> - * of vars; assuming that each var group is independent of the others,
> - * we multiply them together. Any remaining relvarinfos after no more
> - * multivariate matches are found are assumed independent too, so
> - * their individual ndistinct estimates are multiplied also.
> + * Get the numdistinct estimate for the Vars of this rel.
> + *
> + * We iteratively search for multivariate n-distinct with the maximum
> + * number of vars; assuming that each var group is independent of the
> + * others, we multiply them together. Any remaining relvarinfos after
> + * no more multivariate matches are found are assumed independent too,
> + * so their individual ndistinct estimates are multiplied also.
> *
>

Right. Reverted.

> 28. Can you explain what this is?
>
> uint32 type; /* type of MCV list (BASIC) */
>
> I see: #define STATS_MCV_TYPE_BASIC 1 /* basic MCV list type */
>
> but it's not really clear to me what else could exist. Maybe the
> "type" comment can explain there's only one type for now, but more
> might exist in the future?
>

It's the same idea as for dependencies/mvdistinct stats, i.e.
essentially a version number for the data structure so that we can
perhaps introduce some improved version of the data structure in the future.

But now that I think about it, it seems a bit pointless. We would only
do that in a major version anyway, and we don't keep statistics during
upgrades. So we could just as well introduce the version/flag/... if
needed. We can't do this for regular persistent data, but for stats it
does not matter.

So I propose we just remove this thingy from both the existing stats and
this patch.

> 29. Looking at the tests I see you're testing that you get bad
> estimates without extended stats. That does not really seem like
> something that should be done in tests that are meant for extended
> statistics.
>

True, it might be a bit unnecessary. Initially the tests were meant to
show old/new estimates for development purposes, but it might not be
appropriate for regression tests. I don't think it's a big issue, it's
not like it'd slow down the tests significantly. Opinions?

This patch version also does two additional changes:

1) It moves the "single-clause" optimization until after the extended
statistics are applied. This addresses the issue I've explained in [1].

2) The other change is ignoring values that exceed WIDTH_THRESHOLD, as
proposed by Dean in [2]. The idea is similar to per-column stats, so
I've used the same value (1024). It turned out to be pretty simple
change in build_sorted_items, which means it affects both the old and
new statistic types (which is correct).

FWIW while looking at the code I think the existing statistics may be
broken for toasted values, as there's not a single detoast call. I'll
investigate/fix once this commitfest is over.

[1]
https://www.postgresql.org/message-id/d207e075-9fb3-3a95-7811-8e0ab5292b2a%402ndquadrant.com

[2]
https://www.postgresql.org/message-id/CAEZATCVazGRDjbZpRF6r-Asiv_-U8vcT-VA0oSZribhhmDUQHQ%40mail.gmail.com

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-multivariate-MCV-lists.patch.gz application/gzip 39.9 KB
0002-multivariate-histograms.patch.gz application/gzip 48.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2019-03-23 23:46:23 Re: The two "XML Fixes" patches still in need of review
Previous Message Peter Geoghegan 2019-03-23 23:08:42 Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint