Re: multivariate statistics (v19)

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v19)
Date: 2017-01-30 16:12:16
Message-ID: 20170130161216.ihi3nldakztn2shh@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra wrote:
> On 01/03/2017 05:22 PM, Tomas Vondra wrote:
> > On 01/03/2017 02:42 PM, Dilip Kumar wrote:
> ...
> > > I think it should be easily reproducible, in case it's not I can send
> > > call stack or core dump.
> > >
> >
> > Thanks for the report. It was trivial to reproduce and it turned out to
> > be a fairly simple bug. Will send a new version of the patch soon.
> >
>
> Attached is v22 of the patch series, rebased to current master and fixing
> the reported bug. I haven't made any other changes - the issues reported by
> Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
> other reviews.

Hmm. So we have a catalog pg_mv_statistics which stores two things:
1. the configuration regarding mvstats that have been requested by user
via CREATE/ALTER STATISTICS
2. the actual values captured from the above, via ANALYZE

I think this conflates two things that really are separate, given their
different timings and usage patterns. This decision is causing the
catalog to have columns enabled/built flags for each set of stats
requested, which looks a bit odd. In particular, the fact that you have
to heap_update the catalog in order to add more stuff as it's built
looks inconvenient.

Have you thought about having the "requested" bits be separate from the
actual computed values? Something like

pg_mv_statistics
starelid
staname
stanamespace
staowner -- all the above as currently
staenabled array of "char" {d,f,s}
stakeys
// no CATALOG_VARLEN here

where each char in the staenabled array has a #define and indicates one
type, "ndistinct", "functional dep", "selectivity" etc.

The actual values computed by ANALYZE would live in a catalog like:

pg_mv_statistics_values
stvstaid -- OID of the corresponding pg_mv_statistics row. Needed?
stvrelid -- same as starelid
stvkeys -- same as stakeys
#ifdef CATALOG_VARLEN
stvkind 'd' or 'f' or 's', etc
stvvalue the bytea blob
#endif

I think that would be simpler, both conceptually and in terms of code.

The other angle to consider is planner-side: how does the planner gets
to the values? I think as far as the planner goes, the first catalog
doesn't matter at all, because a statistics type that has been enabled
but not computed is not interesting at all; planner only cares about the
values in the second catalog (this is why I added stvkeys). Currently
you're just caching a single pg_mv_statistics row in get_relation_info
(and only if any of the "built" flags is set), which is simple. With my
proposed change, you'd need to keep multiple pg_mv_statistics_values
rows.

But maybe you already tried something like what I propose and there's a
reason not to do it?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2017-01-30 16:21:59 Re: Deadlock in XLogInsert at AIX
Previous Message Stephen Frost 2017-01-30 16:06:03 Re: One-shot expanded output in psql using \G