Re: multivariate statistics / patch v7

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tomas(dot)vondra(at)2ndquadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, jeff(dot)janes(at)gmail(dot)com, sfrost(at)snowman(dot)net
Subject: Re: multivariate statistics / patch v7
Date: 2015-07-03 05:30:34
Message-ID: 20150703.143034.132752108.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, I started to work on this patch.

> attached is v7 of the multivariate stats patch. The main improvement
> is major refactoring of the clausesel.c portion - splitting the
> awfully long spaghetti-style functions into smaller pieces, making it
> much more understandable etc.

Thank you, it looks clearer. I have some comment for the brief
look at this. This patchset is relatively large so I will comment
on "per-notice" basis.. which means I'll send comment before
examining the entire of this patchset. Sorry in advance for the
desultory comments.

=======
General comments:

- You included unnecessary stuffs such like regression.diffs in
these patches.

- Now OID 3307 is used by pg_stat_file. I moved
pg_mv_stats_dependencies_info/show to 3311/3312.

- Single-variate stats have a mechanism to inject arbitrary
values as statistics, that is, get_relation_stats_hook and the
similar stuffs. I want the similar mechanism for multivariate
statistics, too.

0001:

- I also don't think it is right thing for expression_tree_walker
to recognize RestrictInfo since it is not a part of expression.

0003:

- In clauselist_selectivity, find_stats is uselessly called for
single clause. This should be called after the clauselist found
to consist more than one clause.

- Searching vars to be compared with mv-stat columns which
find_stats does should stop at disjunctions. But this patch
doesn't behave so and it should be an unwanted behavior. The
following steps shows that.

====
=# CREATE TABLE t1 (a int, b int, c int);
=# INSERT INTO t1 (SELECT a, a * 2, a * 3 FROM generate_series(0, 9999) a);
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=1 width=12)
=# ALTER TABLE t1 ADD STATISTICS (HISTOGRAM) ON (a, b, c);
=# ANALZYE t1;
=# EXPLAIN SELECT * FROM t1 WHERE a = 1 AND b = 2 OR c = 3;
Seq Scan on t1 (cost=0.00..230.00 rows=268 width=12)
====
Rows changed unwantedly.

It seems not so simple thing as your code assumes.

> I do assume some of those pieces are unnecessary because there already
> is a helper function with the same purpose (but I'm not aware of
> that). But IMHO this piece of code begins to look reasonable
> (especially when compared to the previous state).

Year, such kind of work should be done later:p This patch is
not-so-invasive so as to make it undoable.

> The other major improvement it review of the comments (including
> FIXMEs and TODOs), and removal of the obsolete / misplaced ones. And
> there was plenty of those ...
>
> These changes made this version ~20k smaller than v6.
>
> The patch also rebases to current master, which I assume shall be
> quite stable - so hopefully no more duplicate OIDs for a while.
>
> There are 6 files attached, but only 0002-0006 are actually part of
> the multivariate statistics patch itself. The first part makes it
> possible to use pull_varnos() with expression trees containing
> RestrictInfo nodes, but maybe this is not the right way to fix this
> (there's another thread where this was discussed).

As mentioned above, checking if mv stats can be applied would be
more complex matter than now you are assuming. I also will
consider that.

> Also, the regression tests testing plan choice with multivariate stats
> (e.g. that a bitmap index scan is chosen instead of index scan) fail
> from time to time. I suppose this happens because the invalidation
> after ANALYZE is not processed before executing the query, so the
> optimizer does not see the stats, or something like that.

I saw that occurs, but have no idea how it occurs so far..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2015-07-03 05:34:44 Re: WAL logging problem in 9.4.3?
Previous Message Amit Langote 2015-07-03 05:23:58 Re: [PROPOSAL] VACUUM Progress Checker.