Re: Clamping reulst row number of joins.

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: sfrost(at)snowman(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Clamping reulst row number of joins.
Date: 2015-03-09 07:38:56
Message-ID: 20150309.163856.158844251.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, thank you for the considerations by all of you, but I have
described incorrectly the situation. I'm terribly sorry for the
imprecise description and for your additional work based on it.

The point of this issue is not VAULES but the nature of Append
and NestLoop. Nested Loop can fail to have correct estimation
when it contains Apeend node on insufficiently analyzed
inhertance parent or UNION ALL. The latter is not saved by Tom's
patch nor proper analyzing.

Let me try to redescribe the issue.

At Sun, 08 Mar 2015 19:30:36 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <21519(dot)1425857436(at)sss(dot)pgh(dot)pa(dot)us>
> I wrote:
> >> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> >>> I've certainly seen and used values() constructs in joins for a variety
> >>> of reasons and I do think it'd be worthwhile for the planner to know how
> >>> to pull up a VALUES construct.

The query reported to me didn't included VALUES. I implicitly
used it as an shortcut to make Append node that cannot retrive
statistics on itself. It was the main cause of this confision.

Addition to that, there was somehow no statistics on the
inheritance parent (not as a child), which I didn't mentioned. I
don't know how the situation was made but I can guess manual
analyzes as a cause.

After all, the exact situation will emerge by following steps.

CREATE TABLE p (a int);
CREATE TABLE c1 (like p) INHERITS (p);
INSERT INTO c1 (SELECT (a%2)*500000 + a / 2 FROM generate_series(0, 1000000) a);
CREATE INDEX i_c1_a on c1 (a);
CREATE TABLE t1 (a int);
INSERT INTO t1 VALUES (1000000);
DELETE FROM pg_statistics;
ANALYZE t1;
ANALYZE c1;
SET join_collapse_limit TO 1;
SET random_page_cost TO 8.0;
SET seq_page_cost TO 0.5;
EXPLAIN analyzE SELECT a FROM p t WHERE a IN (select a from t1);
-----------------------------------------------------------------
> Nested Loop (cost=1.01..9.49 rows=500001 width=4)
> (actual time=0.041..0.041 rows=0 loops=1)
> -> HashAggregate (cost=1.01..1.02 rows=1 width=4)
> (actual time=0.017..0.017 rows=1 loops=1)
> Group Key: t1.a
> -> Seq Scan on t1 (cost=0.00..1.01 rows=1 width=4)
> (actual time=0.007..0.009 rows=1 loops=1)
> -> Append (cost=0.00..8.45 rows=2 width=4)
> (actual time=0.018..0.018 rows=0 loops=1)
> -> Seq Scan on p t (cost=0.00..0.00 rows=1 width=4)
> (actual time=0.000..0.000 rows=0 loops=1)
> Filter: (t1.a = a)
> -> Index Only Scan using i_c1_a on c1 t_1
> (cost=0.42..8.45 rows=1 width=4)
> (actual time=0.012..0.012 rows=0 loops=1)
> Index Cond: (a = t1.a)
> Heap Fetches: 0
> Planning time: 0.408 ms
> Execution time: 0.135 ms
> (12 rows)

The nested loop above decided that rows to be 500001 because
seleqfunc_semi couldn't get MVC list for p (as the parent) and
returns 0.5 as the default join selectivity.

Therefore, ANALYZE p makes this work sane for the case.

ANALYZE p;

EXPLAIN analyzE SELECT a FROM p t WHERE a IN (select a from t1);
-----------------------------------------------------------------
> Nested Loop (cost=1.01..9.49 rows=1 width=4)

I thought that the case of apparently wrong row numbers would be
saved by clampling the number of result rows as I mentioned but
since it is saved by the proper analyzes, the necessity is rather
low if analyze works always. Still, the following example cannot
be saved even by analyze.

CREATE TABLE c2 (LIKE p);
ANALYZE p;
EXPLAIN analyzE select a FROM (SELECT a FROM c1 UNION ALL SELECT a from c2) t
WHERE a IN (select a from t1);
QUERY PLAN
---------------------------------------------------------------------------
Nested Loop (cost=1.44..51.48 rows=501276 width=4)
(actual time=0.046..0.046 rows=0 loops=1)
-> HashAggregate (cost=1.01..1.02 rows=1 width=4)
(actual time=0.018..0.019 rows=1 loops=1)
Group Key: t1.a
-> Seq Scan on t1 (cost=0.00..1.01 rows=1 width=4)
(actual time=0.008..0.010 rows=1 loops=1)
-> Append (cost=0.42..50.32 rows=14 width=4)
(actual time=0.023..0.023 rows=0 loops=1)
-> Index Only Scan using i_c1_a on c1
(cost=0.42..8.45 rows=1 width=4)
(actual time=0.016..0.016 rows=0 loops=1)
Index Cond: (a = t1.a)
Heap Fetches: 0
-> Seq Scan on c2 (cost=0.00..41.88 rows=13 width=4)
(actual time=0.001..0.001 rows=0 loops=1)
Filter: (t1.a = a)
Planning time: 0.384 ms
Execution time: 0.185 ms
(12 rows)

14 * 1 = 501276... Of course this cannot be saved by the
values-flatten patch unfortunately...

I think this is a not so rare case..

==============================================================
> > I spent a bit of time looking at this, and realized that the blocker
> > is the same as the reason why we don't pull up sub-selects with empty
> > rangetables ("SELECT expression(s)"). Namely, that the upper query's
> > jointree can't handle a null subtree. (This is not only a matter of
> > the jointree data structure, but the fact that the whole planner
> > identifies relations by bitmapsets of RTE indexes, and subtrees with
> > empty RTE sets couldn't be told apart.)
>
> > We could probably fix both cases for order-of-a-hundred lines of new code
> > in prepjointree. The plan I'm thinking about is to allow such vacuous
> > subquery jointrees to be pulled up, but only if they are in a place in
> > the upper query's jointree where it's okay to delete the subtree. This
> > would basically be two cases: (1) the immediate parent is a FromExpr that
> > would have at least one remaining child, or (2) the immediate parent is
> > an INNER JOIN whose other child isn't also being deleted (so that we can
> > convert the JoinExpr to a nonempty FromExpr, or just use the other child
> > as-is if the JoinExpr has no quals).
>
> Here's a draft patch along those lines. Unsurprisingly, it changes the
> plans generated for a number of regression-test queries. In most cases
> I felt it desirable to force the old plan shape to be retained (by
> inserting "offset 0" or equivalent) because the test was trying to test
> proper generation of a query plan of that shape. I did add a couple
> cases where the optimization was allowed to go through.
>
> The patch is a bit bigger than I'd hoped (a net of about 330 lines added
> to prepjointree.c), but it's not hugely ugly, and it doesn't add any
> meaningful overhead in cases where no optimization happens. Barring
> objections I will commit this.

However, I feel I'm in charge of looking this:) as far as I can.

The patch removes inner joins which contain a child which can be
treated as a value converting the join condition into a simple(?)
qual. The intension looks (as far as I can see) ok and to work as
intended.

It works for very limited simple cases, inner join with VALUES
with one values list, inner join with subselect without from list
and having stable tlist. As such, my first example,

> EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
> JOIN (VALUES (1)) tt(x) ON tt.x = t.a;
...
> Append (cost=0.00..1.01 rows=1 width=4)
> -> Seq Scan on t1 (cost=0.00..1.01 rows=1 width=4)
> Filter: (1 = a)
> (3 rows)

is the one which is affected. The parallel example of subselect
is,

> EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
> JOIN (SELECT int8in('0')) tt(x) ON tt.x = t.a;

I found nothing to be commented in the code.

I dont have a solid opinion on the balance between the usability
and the code size, but there's one concern about planning time.

By this patch, planner scanns the whole jointree once always, and
more once if deletable subqueries. But I don't understand it's
worth while for the gain or not..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-03-09 07:43:19 Object files generated by ecpg test suite not ignored on Windows
Previous Message Fujii Masao 2015-03-09 07:29:23 Re: [REVIEW] Re: Compression of full-page-writes