Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Adrien Nayrat <adrien(dot)nayrat(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2017-12-19 19:17:41
Message-ID: 7F76E900-9A20-4640-A431-83B0602C3213@gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers


> On Nov 27, 2017, at 8:47 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> Attached is an updated version of the patch series, fixing the issues
> reported by Mark Dilger:
>
> 1) Fix fabs() issue in histogram.c.
>
> 2) Do not rely on extra_data being StdAnalyzeData, and instead lookup
> the LT operator explicitly. This also adds a simple regression tests to
> make sure ANALYZE on arrays works fine, but perhaps we should invent
> some simple queries too.
>
> 3) I've removed / clarified some of the comments mentioned by Mark.
>
> 4) I haven't changed how the statistics kinds are defined in relation.h,
> but I agree there should be a comment explaining how STATS_EXT_INFO_*
> relate to StatisticExtInfo.kinds.
>
> 5) The most significant change happened histograms. There used to be two
> structures for histograms:
>
> - MVHistogram - expanded (no deduplication etc.), result of histogram
> build and never used for estimation
>
> - MVSerializedHistogram - deduplicated to save space, produced from
> MVHistogram before storing in pg_statistic_ext and never used for
> estimation
>
> So there wasn't really any reason to expose the "non-serialized" version
> outside histogram.c. It was just confusing and unnecessary, so I've
> moved MVHistogram to histogram.c (and renamed it to MVHistogramBuild),
> and renamed MVSerializedHistogram. And same for the MVBucket stuff.
>
> So now we only deal with MVHistogram everywhere, except in histogram.c.
>
> 6) I've also made MVHistogram to include a varlena header directly (and
> be packed as a bytea), which allows us to store it without having to
> call any serialization functions).
>
> I guess if we should do (5) and (6) for the MCV lists too, it seems more
> convenient than the current approach. And perhaps even for the
> statistics added to 9.6 (it does not change the storage format).

I tested your latest patches on my mac os x laptop and got one test
failure due to the results of 'explain' coming up differently. For the record,
I followed these steps:

cd postgresql/
git pull
# this got my directory up to 8526bcb2df76d5171b4f4d6dc7a97560a73a5eff with no local changes
patch -p 1 < ../0001-multivariate-MCV-lists.patch
patch -p 1 < ../0002-multivariate-histograms.patch
./configure --prefix=/Users/mark/master/testinstall --enable-cassert --enable-tap-tests --enable-depend && make -j4 && make check-world

mark

Attachment Content-Type Size
regression.diffs application/octet-stream 10.9 KB
stats_ext.out application/octet-stream 31.0 KB
unknown_filename text/plain 2 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2017-12-19 19:38:08 Re: WIP: BRIN multi-range indexes
Previous Message David Fetter 2017-12-19 18:59:18 Re: WIP: a way forward on bootstrap data