Re: WIP: multivariate statistics / proof of concept

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "David Rowley" <dgrowleyml(at)gmail(dot)com>
Cc: "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "Petr Jelinek" <petr(at)2ndquadrant(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: multivariate statistics / proof of concept
Date: 2014-10-30 10:29:39
Message-ID: 52104c929305f15412596e6ac7ca0426.squirrel@2.emaily.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dne 30 Říjen 2014, 10:17, David Rowley napsal(a):
> On Thu, Oct 30, 2014 at 12:48 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>
>> Dne 29 Říjen 2014, 12:31, Petr Jelinek napsal(a):
>> >> I've not really gotten around to looking at the patch yet, but I'm
>> also
>> >> wondering if it would be simple include allowing functional
>> statistics
>> >> too. The pg_mv_statistic name seems to indicate multi columns, but
>> how
>> >> about stats on date(datetime_column), or perhaps any non-volatile
>> >> function. This would help to solve the problem highlighted here
>> >>
>> http://www.postgresql.org/message-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9Uf34DhQ@mail.gmail.com
>> >> . Without giving it too much thought, perhaps any expression that can
>> be
>> >> indexed should be allowed to have stats? Would that be really
>> difficult
>> >> to implement in comparison to what you've already done with the patch
>> so
>> >> far?
>> >>
>> >
>> > I would not over-complicate requirements for the first version of
>> this,
>> > I think it's already complicated enough.
>>
>> My thoughts, exactly. I'm not willing to put more features into the
>> initial version of the patch. Actually, I'm thinking about ripping out
>> some experimental features (particularly "hashed MCV" and "associative
>> rules").
>>
>>
> That's fair, but I didn't really mean to imply that you should go work on
> that too and that it should be part of this patch..
> I was thinking more along the lines of that I don't really agree with the
> table name for the new stats and that at some later date someone will want
> to add expression stats and we'd probably better come up design that would
> be friendly towards that. At this time I can only think that the name of
> the table might not suit well to expression stats, I'd hate to see someone
> have to invent a 3rd table to support these when we could likely come up
> with something that could be extended later and still make sense both
> today
> and in the future.
>
> I was just looking at how expression indexes are stored in pg_index and I
> see that if it's an expression index that the expression is stored in
> the indexprs column which is of type pg_node_tree, so quite possibly at
> some point in the future the new stats table could just have an extra
> column added, and for today, we'd just need to come up with a future proof
> name... Perhaps pg_statistic_ext or pg_statisticx, and name functions and
> source files something along those lines instead?

Ah, OK. I don't think the catalog name "pg_mv_statistic" is somehow
inappropriate for this purpose, though. IMHO the "multivariate" does not
mean "only columns" or "no expressions", it simply describes that the
approximated density function has multiple input variables, be it
attributes or expressions.

But maybe there's a better name.

Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2014-10-30 10:30:52 Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index
Previous Message Kyotaro HORIGUCHI 2014-10-30 09:56:22 Re: alter user/role CURRENT_USER