Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Mark Dilger <hornschnorter(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-03-26 15:21:08
Message-ID: 20190326152108.GB3088@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 26, 2019 at 01:37:33PM +0000, Dean Rasheed wrote:
>On Tue, 26 Mar 2019 at 11:59, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>>
>> On Mon, 25 Mar 2019 at 23:36, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> >
>> > Attached is an updated patch...
>>
>> I just looked through the latest set of changes and I have a couple of
>> additional review comments:
>>
>
>I just spotted another issue while reading the code:
>
>It's possible to build an MCV list with more than
>STATS_MCVLIST_MAX_ITEMS = 8192 items, which then causes an error when
>the code tries to read it back in:
>
>create temp table foo(a int, b int);
>insert into foo select x,x from generate_series(1,10000) g(x);
>insert into foo select x,x from generate_series(1,10000) g(x);
>alter table foo alter column a set statistics 10000;
>alter table foo alter column b set statistics 10000;
>create statistics s (mcv) on a,b from foo;
>analyse foo;
>select * from foo where a=1 and b=1;
>
>ERROR: invalid length (10000) item array in MCVList
>
>So this needs to be checked when building the MCV list.
>
>In fact, the stats targets for table columns can be as large as 10000
>(a hard-coded constant in tablecmds.c, which is pretty ugly, but
>that's a different matter), so I think STATS_MCVLIST_MAX_ITEMS
>probably ought to match that.
>
>There are also a couple of comments that refer to the 8k limit, which
>would need updating, if you change it.
>

I think we can simply ditch this separate limit, and rely on the
statistics target. The one issue with it is that if we ever allows the
statistics target to we may end up overflowing uint16 (which is used in
the serialized representation). But I think it's OK to just check that in
an assert, or so.

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-03-26 15:22:09 Re: pgsql: Get rid of backtracking in jsonpath_scan.l
Previous Message Christoph Berg 2019-03-26 15:14:46 Re: Enable data checksums by default