Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-24 00:17:18
Message-ID: CAKJS1f9o23x-utM3Jk2wji_tQR-CZgyPP5cqQgt1e_zhHMYQpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 24 Mar 2019 at 12:41, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> On 3/21/19 4:05 PM, David Rowley wrote:
> > 11. In get_mincount_for_mcv_list() it's probably better to have the
> > numerical literals of 0.0 instead of just 0.
>
> Why?

Isn't it what we do for float and double literals?

>
> > 12. I think it would be better if you modified build_attnums_array()
> > to add an output argument that sets the size of the array. It seems
> > that most places you call this function you perform bms_num_members()
> > to determine the array size.
>
> Hmmm. I've done this, but I'm not sure I like it very much - there's no
> protection the value passed in is the right one, so the array might be
> allocated either too small or too large. I think it might be better to
> make it work the other way, i.e. pass the value out instead.

When I said "that sets the size", I meant "that gets set to the size",
sorry for the confusion. I mean, if you're doing bms_num_members()
inside build_attnums_array() anyway, then this will save you from
having to do that in the callers too.

> > 21. Do you think it would be better to declare
> > pg_stats_ext_mcvlist_items() to accept the oid of the pg_statistic_ext
> > row rather than the stxmcv column? (However, I do see you have a mcv
> > type, so perhaps you might want other types in the future?)
> >
>
> I don't think so, I don't see what advantages would it have.

Okay. I just wanted to ask the question. When I thought of it I had
in mind that it might be possible to carefully craft some bytea value
to have the function crash, but when I tried to I discovered that the
input function for the pg_mcv_list just errors, so it's impossible to
cast a bytea value to pg_mcv_list.

> > 28. Can you explain what this is?
> >
> > uint32 type; /* type of MCV list (BASIC) */
> >
> > I see: #define STATS_MCV_TYPE_BASIC 1 /* basic MCV list type */
> >
> > but it's not really clear to me what else could exist. Maybe the
> > "type" comment can explain there's only one type for now, but more
> > might exist in the future?
> >
>
> It's the same idea as for dependencies/mvdistinct stats, i.e.
> essentially a version number for the data structure so that we can
> perhaps introduce some improved version of the data structure in the future.
>
> But now that I think about it, it seems a bit pointless. We would only
> do that in a major version anyway, and we don't keep statistics during
> upgrades. So we could just as well introduce the version/flag/... if
> needed. We can't do this for regular persistent data, but for stats it
> does not matter.
>
> So I propose we just remove this thingy from both the existing stats and
> this patch.

I see. I wasn't aware that existed for the other types. It certainly
gives some wiggle room if some mistakes were discovered after the
release, but thinking about it, we could probably just change the
"magic" number and add new code in that branch only to ignore the old
magic number, perhaps with a WARNING to analyze the table again. The
magic field seems sufficiently early in the struct that we could do
that. In the master branch we'd just error if the magic number didn't
match, since we wouldn't have to deal with stats generated by the
previous version's bug.

> > 29. Looking at the tests I see you're testing that you get bad
> > estimates without extended stats. That does not really seem like
> > something that should be done in tests that are meant for extended
> > statistics.
> >
>
> True, it might be a bit unnecessary. Initially the tests were meant to
> show old/new estimates for development purposes, but it might not be
> appropriate for regression tests. I don't think it's a big issue, it's
> not like it'd slow down the tests significantly. Opinions?

My thoughts were that if someone did something to improve non-MV
stats, then is it right for these tests to fail? What should the
developer do in the case? update the expected result? remove the test?
It's not so clear.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2019-03-24 01:03:43 Re: SQL statement PREPARE does not work in ECPG
Previous Message Chapman Flack 2019-03-23 23:46:23 Re: The two "XML Fixes" patches still in need of review