Re: Clamping reulst row number of joins.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Clamping reulst row number of joins.
Date: 2015-03-06 15:07:53
Message-ID: 19430.1425654473@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> Hello, I had a report that a query gets wired estimated row
> numbers and it makes subsequent plans wrong.

> Although the detailed explanation is shown later in this mail,
> the problem here was that a query like following makes the path
> with apparently wrong row number.

> EXPLAIN SELECT a FROM (SELECT a FROM t1 UNION ALL SELECT 0) t
> JOIN (VALUES (1)) tt(x) ON tt.x = t.a;

Hm, the problem evidently is that we get a default selectivity estimate
for the ON condition. I think a proper fix for this would involve
teaching eqjoinsel (and ideally other join selectivity functions) how
to drill down into appendrels and combine estimates for the child
relations.

> I suppose that join nodes can safely clamp the number of its
> result rows by the product of the row numbers of underlying two
> paths. And if it is correct, the attached patch can correct the
> estimation like this.

Unfortunately, this patch is completely horrid, because it gives an unfair
advantage to the parameterized-nestloop plan. (The hacks in the other
two functions are no-ops, because only a nestloop plan will have a
parameterized path for the appendrel.) What's more, it's only a cosmetic
fix: it won't correct the planner's actual idea of the joinrel size, which
means it won't have an impact on planning any additional levels of
joining.

We really need to fix this on the selectivity-estimate side.

BTW, is that JOIN (VALUES(...)) thing common in applications, or did you
just use it to make a compact example? If it were something worth
optimizing, it seems like we could teach the planner to "pull up VALUES"
in the same way that it flattens sub-selects. I'm not sure if this is
worth the trouble or not, though.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-03-06 15:16:12 Re: parallel mode and parallel contexts
Previous Message Alvaro Herrera 2015-03-06 15:00:20 Re: MD5 authentication needs help