Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Mark Dilger <hornschnorter(at)gmail(dot)com>
Cc: Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2017-11-27 16:47:13
Message-ID: e410d526-ab5c-0ca3-672b-6a1876a42ea8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Attached is an updated version of the patch series, fixing the issues
reported by Mark Dilger:

1) Fix fabs() issue in histogram.c.

2) Do not rely on extra_data being StdAnalyzeData, and instead lookup
the LT operator explicitly. This also adds a simple regression tests to
make sure ANALYZE on arrays works fine, but perhaps we should invent
some simple queries too.

3) I've removed / clarified some of the comments mentioned by Mark.

4) I haven't changed how the statistics kinds are defined in relation.h,
but I agree there should be a comment explaining how STATS_EXT_INFO_*
relate to StatisticExtInfo.kinds.

5) The most significant change happened histograms. There used to be two
structures for histograms:

- MVHistogram - expanded (no deduplication etc.), result of histogram
build and never used for estimation

- MVSerializedHistogram - deduplicated to save space, produced from
MVHistogram before storing in pg_statistic_ext and never used for
estimation

So there wasn't really any reason to expose the "non-serialized" version
outside histogram.c. It was just confusing and unnecessary, so I've
moved MVHistogram to histogram.c (and renamed it to MVHistogramBuild),
and renamed MVSerializedHistogram. And same for the MVBucket stuff.

So now we only deal with MVHistogram everywhere, except in histogram.c.

6) I've also made MVHistogram to include a varlena header directly (and
be packed as a bytea), which allows us to store it without having to
call any serialization functions).

I guess if we should do (5) and (6) for the MCV lists too, it seems more
convenient than the current approach. And perhaps even for the
statistics added to 9.6 (it does not change the storage format).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-multivariate-MCV-lists.patch.gz application/gzip 31.5 KB
0002-multivariate-histograms.patch.gz application/gzip 60.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2017-11-27 16:51:26 Re: [HACKERS] Parallel Append implementation
Previous Message Erik Rijkers 2017-11-27 16:40:08 Re: Add RANGE with values and exclusions clauses to the Window Functions