|From:||Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>|
|To:||Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com|
|Cc:||dilipbalaut(at)gmail(dot)com, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, dean(dot)a(dot)rasheed(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, robertmhaas(at)gmail(dot)com, ishii(at)postgresql(dot)org, david(at)pgmasters(dot)net, michael(dot)paquier(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, petr(at)2ndquadrant(dot)com, jeff(dot)janes(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org|
|Subject:||Re: multivariate statistics (v19)|
|Views:||Raw Message | Whole Thread | Download mbox|
thanks for the reviews! Attached is v23 of the patch series, addressing
most of the points raised in the reviews.
A quick summary of the changes (I'll respond to the other threads for
points that deserve a bit more detailed discussion):
0) Rebase to current master. The main culprit was the pesky logical
replication patch committed a week ago, because SUBSCRIPTION and
STATISTICS are right next to each other in gram.y, various switches etc.
1) Many typos, mentioned by all the reviewers.
2) I've added a short explanation (in alter_table.sgml) of how ALTER
TABLE ... DROP COLUMN handles multivariate statistics, i.e. that those
are only dropped if there would be a single remaining column.
3) I've reworded 'thoroughly' to 'in more detail' in planstats.sgml, to
make Petr happy ;-)
4) Added missing comments to get_statistics_oid, RelationGetMVStatList,
update_mv_stats, ndistinct_for_combination. Also update_mv_stats() was
not used outside common.c, so I've made it static and removed the
prototype from mvstats.h.
5) I've changed 'statistics does not exist' to 'statistics do not exist'
on a number of places.
6) Removed XXX about checking for duplicates in CreateStatistics. I
agree with Petr that we shouldn't do such checks, as we're not doing
that for other objects (e.g. indexes).
7) I've moved moved the code loading statistics from get_relation_info
into a new function get_relation_statistics, to get rid of the
block, which was there due to mimicking how index details are loaded
without having hasindex-like flag. I like this better than merging the
block into get_relation_info directly.
8) I've changed 'a statistics' to 'multivariate statistics' on a few
places in sgml docs, to make it clear it's not referring to the
'regular' statistics (e.g. at CREATE/DROP STATISTICS, mentioned by
9) I've changed the link in README.dependencies to
https://en.wikipedia.org/wiki/Functional_dependency as proposed by
Ideriha Takeshi. I'm pretty sure the wiki page about database
normalization, referenced by the original link, included a nice
functional dependency example some time ago, but it seems to have
changed and the new link is better.
But perhaps it's not a good idea to link to wikipedia, as the pages
clearly change quite significantly?
10) The CREATE STATISTICS now reports a nice 'already exists' message,
instead of the 'duplicate key', pointed out by Dilip.
11) MVNDistinctItem/MVNDistinctData now use FLEXIBLE_ARRAY_MEMBER for
the array, just like the other structs.
On 01/26/2017 12:01 PM, Kyotaro HORIGUCHI wrote:
> - The k is assumed larger than 1. I think assertion is required.
> - "/* end of the preceding group */" seems to be better if it
> is just after the "if (multi_sort.." currently just after it.
> - The following comment seems mis-edited.
> > * If there is a single are no contradicting rows, count the group
> > * as supporting, otherwise contradicting.
> maybe this would be like the following? The varialbe counting
> the first "contradiction" is named "n_violations". This seems
> somewhat confusing.
> > * If there are no violating rows up to here, count the group
> > * as supporting, otherwise contradicting.
> - "/* first columns match, but the last one does not"
> else if (multi_sort_compare_dims((k - 1), (k - 1), ...
> The above comparison should use multi_sort_compare_dim, not
> - This function counts "n_contradicting_rows" but it is not
> referenced. Anyway n_contradicting_rows = numrows -
> n_supporing_rows so it and n_contradicting seem
Yes, absolutely. This was clearly unnecessary remainder of the original
implementation, and I failed to clean it up after adopting Dean's idea
of continuous dependency degree.
I've also reworked the method a bit, moving handling of the last group
into the main loop (instead of doing that separately right after the
loop, which I think was a bit ugly anyway). Can you check if you're
happy with the code & comments now?
> - struct MVDependencyData/ MVDependenciesData
> The varialbe length member at the last of the structs should
> be defined using FLEXIBLE_ARRAY_MEMBER, from the convention.
Yes, fixed. The other structures already used that macro, but I failed
to notice MVDependencyData/ MVDependenciesData need that fix too.
> - I'm not sure how much it impacts performance, but some
> struct members seems to have a bit too wide types. For
> example, MVDepedenciesData.type is of int32 but it can have
> only '1' for now and it won't be two-digits. Also ndeps
> cannot be so large.
I doubt the impact on performance is measurable, particularly for the
global fields (e.g. nbuckets is tiny compared to the space needed for
the buckets themselves).
But I think you're right we shouldn't use fields wider than actually
needed (e.g. using uint32 for nbuckets is a bit insane, and uint16 would
be just fine). It's not just a matter of performance, but also a way to
document expected values etc.
I'll go through the fields and use smaller data types where appropriate.
> This patch uses int16 as the type of attrubute number but it
> might be better to use AttrNumber for the purpose.
> (Specifically it seems defined as the type for an attribute
> index but also used as the varialbe for number of attributes)
Agreed. Will check with the struct members.
> Sorry for the random comment in advance. I'll learn this further.
Thanks for the review!
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
|Next Message||Alvaro Herrera||2017-01-30 19:16:40||Re: VACUUM's ancillary tasks|
|Previous Message||Nikhil Sontakke||2017-01-30 18:45:33||Re: Speedup twophase transactions|