Re: multivariate statistics (v19)

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v19)
Date: 2016-12-30 13:05:18
Message-ID: 121f2427-d8c1-9494-1c09-b41b6dd8276a@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/16 22:50, Tomas Vondra wrote:
>> +<programlisting>
>> +SELECT pg_mv_stats_dependencies_show(stadeps)
>> + FROM pg_mv_statistic WHERE staname = 's1';
>> +
>> + pg_mv_stats_dependencies_show
>> +-------------------------------
>> + (1) => 2, (2) => 1
>> +(1 row)
>> +</programlisting>
>>
>> Couldn't this somehow show actual column names, instead of attribute
>> numbers?
>>
>
> Yeah, I was thinking about that too. The trouble is that's table-level
> metadata, so we don't have that kind of info serialized within the data
> type (e.g. because it would not handle column renames etc.).
>
> It might be possible to explicitly pass the table OID as a parameter of
> the function, but it seemed a bit ugly to me.

I think it makes sense to have such function, this is not out function
so I think it's ok for it to have the oid as input, especially since in
the use-case shown above you can use starelid easily.

>
> FWIW, as I wrote in this thread, the place where this patch series needs
> feedback most desperately is integration into the optimizer. Currently
> all the magic happens in clausesel.c and does not leave it.I think it
> would be good to move some of that (particularly the choice of
> statistics to apply) to an earlier stage, and store the information
> within the plan tree itself, so that it's available outside clausesel.c
> (e.g. for EXPLAIN - showing which stats were picked seems useful).
>
> I was thinking it might work similarly to the foreign key estimation
> patch (100340e2). It might even be more efficient, as the current code
> may end repeating the selection of statistics multiple times. But
> enriching the plan tree turned out to be way more invasive than I'm
> comfortable with (but maybe that'd be OK).
>

In theory it seems like possibly reasonable approach to me, mainly
because mv statistics are user defined objects. I guess we'd have to see
at least some PoC to see how invasive it is. But I ultimately think that
feedback from a committer who is more familiar with planner is needed here.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2016-12-30 13:12:25 Re: multivariate statistics (v19)
Previous Message Pavel Stehule 2016-12-30 13:03:08 Re: proposal: session server side variables