From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | pgsql-committers(at)postgresql(dot)org |
Subject: | pgsql: Implement multivariate n-distinct coefficients |
Date: | 2017-03-24 17:16:40 |
Message-ID: | E1crSpM-0003CM-DI@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Implement multivariate n-distinct coefficients
Add support for explicitly declared statistic objects (CREATE
STATISTICS), allowing collection of statistics on more complex
combinations that individual table columns. Companion commands DROP
STATISTICS and ALTER STATISTICS ... OWNER TO / SET SCHEMA / RENAME are
added too. All this DDL has been designed so that more statistic types
can be added later on, such as multivariate most-common-values and
multivariate histograms between columns of a single table, leaving room
for permitting columns on multiple tables, too, as well as expressions.
This commit only adds support for collection of n-distinct coefficient
on user-specified sets of columns in a single table. This is useful to
estimate number of distinct groups in GROUP BY and DISTINCT clauses;
estimation errors there can cause over-allocation of memory in hashed
aggregates, for instance, so it's a worthwhile problem to solve. A new
special pseudo-type pg_ndistinct is used.
(num-distinct estimation was deemed sufficiently useful by itself that
this is worthwhile even if no further statistic types are added
immediately; so much so that another version of essentially the same
functionality was submitted by Kyotaro Horiguchi:
https://postgr.es/m/20150828.173334.114731693.horiguchi.kyotaro@lab.ntt.co.jp
though this commit does not use that code.)
Author: Tomas Vondra. Some code rework by Álvaro.
Reviewed-by: Dean Rasheed, David Rowley, Kyotaro Horiguchi, Jeff Janes,
Ideriha Takeshi
Discussion: https://postgr.es/m/543AFA15.4080608@fuzzy.cz
https://postgr.es/m/20170320190220.ixlaueanxegqd5gr@alvherre.pgsql
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/7b504eb282ca2f5104b5c00b4f05a3ef6bb1385b
Modified Files
--------------
doc/src/sgml/catalogs.sgml | 97 ++++
doc/src/sgml/func.sgml | 36 +-
doc/src/sgml/ref/allfiles.sgml | 3 +
doc/src/sgml/ref/alter_statistics.sgml | 115 ++++
doc/src/sgml/ref/alter_table.sgml | 9 +-
doc/src/sgml/ref/comment.sgml | 6 +-
doc/src/sgml/ref/create_statistics.sgml | 155 ++++++
doc/src/sgml/ref/drop_statistics.sgml | 98 ++++
doc/src/sgml/reference.sgml | 3 +
src/backend/Makefile | 2 +-
src/backend/catalog/Makefile | 1 +
src/backend/catalog/aclchk.c | 35 ++
src/backend/catalog/dependency.c | 9 +
src/backend/catalog/heap.c | 74 +++
src/backend/catalog/namespace.c | 56 ++
src/backend/catalog/objectaddress.c | 55 ++
src/backend/catalog/pg_shdepend.c | 2 +
src/backend/catalog/system_views.sql | 10 +
src/backend/commands/Makefile | 6 +-
src/backend/commands/alter.c | 8 +
src/backend/commands/analyze.c | 23 +-
src/backend/commands/dropcmds.c | 7 +
src/backend/commands/event_trigger.c | 3 +
src/backend/commands/statscmds.c | 296 ++++++++++
src/backend/nodes/copyfuncs.c | 17 +
src/backend/nodes/equalfuncs.c | 15 +
src/backend/nodes/outfuncs.c | 31 ++
src/backend/optimizer/util/plancat.c | 67 ++-
src/backend/parser/gram.y | 62 ++-
src/backend/statistics/Makefile | 17 +
src/backend/statistics/README | 34 ++
src/backend/statistics/extended_stats.c | 389 +++++++++++++
src/backend/statistics/mvdistinct.c | 671 +++++++++++++++++++++++
src/backend/tcop/utility.c | 12 +
src/backend/utils/adt/ruleutils.c | 81 +++
src/backend/utils/adt/selfuncs.c | 181 +++++-
src/backend/utils/cache/relcache.c | 79 +++
src/backend/utils/cache/syscache.c | 23 +
src/bin/pg_dump/common.c | 4 +
src/bin/pg_dump/pg_backup_archiver.c | 3 +-
src/bin/pg_dump/pg_dump.c | 153 ++++++
src/bin/pg_dump/pg_dump.h | 9 +
src/bin/pg_dump/pg_dump_sort.c | 12 +-
src/bin/psql/describe.c | 51 ++
src/include/catalog/catversion.h | 2 +-
src/include/catalog/dependency.h | 1 +
src/include/catalog/heap.h | 1 +
src/include/catalog/indexing.h | 7 +
src/include/catalog/namespace.h | 2 +
src/include/catalog/pg_cast.h | 4 +
src/include/catalog/pg_proc.h | 11 +
src/include/catalog/pg_statistic_ext.h | 75 +++
src/include/catalog/pg_type.h | 4 +
src/include/catalog/toasting.h | 1 +
src/include/commands/defrem.h | 4 +
src/include/nodes/nodes.h | 2 +
src/include/nodes/parsenodes.h | 15 +
src/include/nodes/relation.h | 19 +
src/include/statistics/extended_stats_internal.h | 64 +++
src/include/statistics/statistics.h | 47 ++
src/include/utils/acl.h | 2 +
src/include/utils/rel.h | 4 +
src/include/utils/relcache.h | 1 +
src/include/utils/syscache.h | 2 +
src/test/regress/expected/alter_generic.out | 45 +-
src/test/regress/expected/object_address.out | 7 +-
src/test/regress/expected/opr_sanity.out | 3 +-
src/test/regress/expected/rules.out | 8 +
src/test/regress/expected/sanity_check.out | 1 +
src/test/regress/expected/stats_ext.out | 155 ++++++
src/test/regress/expected/type_sanity.out | 13 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/alter_generic.sql | 31 ++
src/test/regress/sql/object_address.sql | 4 +-
src/test/regress/sql/stats_ext.sql | 102 ++++
src/test/regress/sql/type_sanity.sql | 2 +-
77 files changed, 3599 insertions(+), 63 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2017-03-24 17:40:08 | pgsql: Make VACUUM VERBOSE report the number of skipped frozen pages. |
Previous Message | Robert Haas | 2017-03-24 16:43:54 | pgsql: plpgsql: Don't generate parallel plans for RETURN QUERY. |
From | Date | Subject | |
---|---|---|---|
Next Message | Beena Emerson | 2017-03-24 17:33:15 | Re: increasing the default WAL segment size |
Previous Message | Robert Haas | 2017-03-24 17:00:40 | Re: Monitoring roles patch |