Quick Links

Re: TPC-H Q20 from 1 hour to 19 hours!

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: TPC-H Q20 from 1 hour to 19 hours!
Date:	2017-06-01 15:40:41
Message-ID:	3185.1496331641@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> Which brings me to the slightly suspicious bit. On 9.5, there's no
> difference between GROUP and GROUP+LIKE cases - the estimates are
> exactly the same in both cases. This is true too, but only without the
> foreign key between "partsupp" and "part", i.e. the two non-grouped
> relations in the join. And what's more, the difference (1737 vs. 16) is
> pretty much exactly 100x, which is the estimate for the LIKE condition.
> So it kinda seems 9.5 does not apply this condition for semi-joins,
> while >=9.6 does that.

I got around to looking into this finally. I assume that when you say
you added a foreign key from partsupp to part, you also created a unique
index on part.p_partkey --- there is no such index in dbt3's version of
the schema, anyway, so I had to add one to create a foreign key.
The unique index itself, never mind the foreign key, changes things
substantially for this query in HEAD, as a result of commit 92a43e485:
the semijoin to part becomes a plain join, and so we go through entirely
different code paths to estimate selectivity. But without that, there's
clearly something odd going on, because the LIKE condition ought to have
some effect on the estimate of number of matched rows.

I poked into it and found that the problem stems from the fact that the
initial estimate of the top join relation's size is estimated using
agg_lineitem.agg_partkey = part.p_partkey as the join condition, not
the original partsupp.ps_partkey = part.p_partkey qual. We can get
the former from the latter via equivalence-clause deductions, and
I think it's just bad luck that the join is formed first in that
direction. What that means is that eqjoinsel_semi() finds no statistics
for the LHS variable, although it does have stats for the RHS. There
is logic in eqjoinsel_semi() that attempts to reduce the semijoin
selectivity estimate proportionally to whatever restriction clauses were
applied to the RHS ... but if you look closely, that code has no effect
if we lack stats for the LHS. We'll fall past both the MCV-dependent
calculation and the nd1-vs-nd2 test and end up taking the selectivity
as just 0.5, independently of anything else.

It seems reasonable to me that we ought to derate that default estimate
by whatever we'd concluded the restriction selectivity on the inner rel
is. So I experimented with the attached patch and verified that it
results in the LIKE affecting the final rowcount estimate in this
situation. I've not tested it much further than that, other than checking
that we still pass regression tests.

I'm not sure if we ought to think about applying this now. It will likely
make things worse not better for the Q20 query, because it will cause the
underestimate for the full query to be even further off. Still, in
principle it seems like it ought to be an improvement in most cases.

The thing that would actually have a chance of improving matters for Q20
would be if we could see our way to looking through the aggregation
subquery and applying the foreign key constraint for lineitem. That
seems like a research project though; it's surely not happening for v10.

I'm also wondering idly if we could fix things so that the join size
estimate gets formed from a join condition that we have stats for rather
than one we lack them for. But I have no clear ideas on ways to go
about that that wouldn't be horrid kluges.

regards, tom lane

Attachment	Content-Type	Size
respond-to-inner-selectivity-even-without-stats.patch	text/x-diff	2.6 KB

In response to

Re: TPC-H Q20 from 1 hour to 19 hours! at 2017-04-06 20:37:12 from Tomas Vondra

Responses

Re: TPC-H Q20 from 1 hour to 19 hours! at 2017-06-11 17:27:16 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dilip Kumar	2017-06-01 15:41:32	Re: <> join selectivity estimate question
Previous Message	Masahiko Sawada	2017-06-01 15:32:36	Re: An incomplete comment sentence in subtrans.c