From: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Cc: | Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date: | 2019-01-17 12:04:14 |
Message-ID: | CAKJS1f81TbFHJtfzYYVH19NWroxX4NOyZ2vvd5y+jb6VLr3WRg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I've started looking over 0002. Here are a few things so far:
1. I think this should be pg_statistic_ext.stxhistogram?
Values of the <type>pg_histogram</type> can be obtained only from the
<literal>pg_statistic.stxhistogram</literal> column.
2. I don't think this bms_copy is needed anymore. I think it was
previously since there were possibly multiple StatisticExtInfo objects
per pg_statistic_ext row, but now it's 1 for 1.
+ info->keys = bms_copy(keys);
naturally, the bms_free() will need to go too.
3. I've not really got into understanding how the new statistics types
are applied yet, but I found this:
* If asked to build both MCV and histogram, first build the MCV part
* and then histogram on the remaining rows.
I guess that means we'll get different estimates with:
create statistic a_stats (mcv,histogram) on a,b from t;
vs
create statistic a_stats1 (mcv) on a,b from t;
create statistic a_stats2 (histogram) on a,b from t;
Is that going to be surprising to people?
4. I guess you can replace "(histogram == NULL);" with "false". The
compiler would likely do it anyway, but...
if (histogram != NULL)
{
/* histogram already is a bytea value, not need to serialize */
nulls[Anum_pg_statistic_ext_stxhistogram - 1] = (histogram == NULL);
values[Anum_pg_statistic_ext_stxhistogram - 1] = PointerGetDatum(histogram);
}
but, hmm. Shouldn't you serialize this, like you are with the others?
5. serialize_histogram() and statext_histogram_deserialize(), should
these follow the same function naming format?
6. IIRC some compilers may warn about this:
if (stat->kinds & requiredkinds)
making it:
if ((stat->kinds & requiredkinds))
should fix that.
UPDATE: Tried to make a few compilers warn about this and failed.
Perhaps I've misremembered.
7. Comment claims function has a parameter named 'requiredkind', but
it no longer does. The comment also needs updated to mention that it
finds statistics with any of the required kinds.
* choose_best_statistics
* Look for and return statistics with the specified 'requiredkind' which
* have keys that match at least two of the given attnums. Return NULL if
* there's no match.
*
* The current selection criteria is very simple - we choose the statistics
* object referencing the most of the requested attributes, breaking ties
* in favor of objects with fewer keys overall.
*
* XXX If multiple statistics objects tie on both criteria, then which object
* is chosen depends on the order that they appear in the stats list. Perhaps
* further tiebreakers are needed.
*/
StatisticExtInfo *
choose_best_statistics(List *stats, Bitmapset *attnums, int requiredkinds)
8. Looking at statext_clauselist_selectivity() I see it calls
choose_best_statistics() passing requiredkinds as STATS_EXT_INFO_MCV |
STATS_EXT_INFO_HISTOGRAM, do you think the function now needs to
attempt to find the best match plus the one with the most statistics
kinds?
It might only matter if someone had:
create statistic a_stats1 (mcv) on a,b from t;
create statistic a_stats2 (histogram) on a,b from t;
create statistic a_stats3 (mcv,histogram) on a,b from t;
Is it fine to just return a_stats1 and ignore the fact that a_stats3
is probably better? Or too corner case to care?
9. examine_equality_clause() assumes it'll get a Var. I see we should
only allow clauses that pass statext_is_compatible_clause_internal(),
so maybe it's worth an Assert(IsA(var, Var)) along with a comment to
mention anything else could not have been allowed.
10. Does examine_equality_clause need 'root' as an argument?
11. UINT16_MAX -> PG_UINT16_MAX
/* make sure we fit into uint16 */
Assert(count <= UINT16_MAX);
(Out of energy for today.)
--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Surafel Temesgen | 2019-01-17 12:14:53 | Re: pg_dump multi VALUES INSERT |
Previous Message | Laurenz Albe | 2019-01-17 10:59:18 | Re: Libpq support to connect to standby server as priority |