Re: [HACKERS] PATCH: multivariate histograms and MCV lists

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Mark Dilger <hornschnorter(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date: 2019-01-24 01:59:50
Message-ID: CAKJS1f_NLyMM7KXHt5+aK7zes-brCk0yRR9=0Us_n0fR6frNWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 23 Jan 2019 at 12:46, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> (Stopped in statext_mcv_build(). Need to take a break)

Continuing...

27. statext_mcv_build() could declare the int j,k variables in the
scope that they're required in.

28. "an array"

* build array of SortItems for distinct groups and counts matching items

29. No need to set isnull to false in statext_mcv_load()

30. Wondering about the reason in statext_mcv_serialize() that you're
not passing the collation to sort the array.

You have:

ssup[dim].ssup_collation = DEFAULT_COLLATION_OID;

should it not be:

ssup[dim].ssup_collation = stats[dim]->attrcollid;
?

31. uint32 should use %u, not %d:

if (mcvlist->magic != STATS_MCV_MAGIC)
elog(ERROR, "invalid MCV magic %d (expected %d)",
mcvlist->magic, STATS_MCV_MAGIC);

and

if (mcvlist->type != STATS_MCV_TYPE_BASIC)
elog(ERROR, "invalid MCV type %d (expected %d)",
mcvlist->type, STATS_MCV_TYPE_BASIC);

and

ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
errmsg("invalid length (%d) item array in MCVList",
mcvlist->nitems)));

I don't think %ld is the correct format for VARSIZE_ANY_EXHDR. %u or
%d seem more suited. I see that value is quite often assigned to int,
so probably can't argue much with %d.

elog(ERROR, "invalid MCV size %ld (expected %zu)",
VARSIZE_ANY_EXHDR(data), expected_size);

32. I think the format is wrong here too:

elog(ERROR, "invalid MCV size %ld (expected %ld)",
VARSIZE_ANY_EXHDR(data), expected_size);

I'd expect "invalid MCV size %d (expected %zu)"

33. How do you allocate a single chunk non-densely?

* Allocate one large chunk of memory for the intermediate data, needed
* only for deserializing the MCV list (and allocate densely to minimize
* the palloc overhead).

34. I thought I saw a few issues with pg_stats_ext_mcvlist_items() so
tried to test it:

create table ab (a int, b int);
insert into ab select x,x from generate_serieS(1,10)x;
create statistics ab_ab_stat (mcv) on a,b from ab;
analyze ab;
select pg_mcv_list_items(stxmcv) from pg_Statistic_ext where stxmcv is not null;
ERROR: cache lookup failed for type 2139062143

The issues I saw were:

You do:
appendStringInfoString(&itemValues, "{");
appendStringInfoString(&itemNulls, "{");

but never append '}' after building the string.

(can use appendStringInfoChar() BTW)

also:

if (i == 0)
{
appendStringInfoString(&itemValues, ", ");
appendStringInfoString(&itemNulls, ", ");
}

I'd have expected you to append the ", " only when i > 0.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-01-24 02:14:12 Re: [HACKERS] Block level parallel vacuum
Previous Message Peter Geoghegan 2019-01-24 01:44:41 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons