Re: Query became very slow after 9.6 -> 10 upgrade

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dmitry Shalashov <skaurus(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Query became very slow after 9.6 -> 10 upgrade
Date: 2017-11-22 23:07:07
Message-ID: 9755.1511392027@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Dmitry Shalashov <skaurus(at)gmail(dot)com> writes:
> Turns out we had not 9.6 but 9.5.

I'd managed to reproduce the weird planner behavior locally in the
regression database:

regression=# create table foo (f1 int[], f2 int);
CREATE TABLE
regression=# explain select * from tenk1 where unique2 in (select distinct unnest(f1) from foo where f2=1);
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=30.85..80.50 rows=6 width=244)
-> HashAggregate (cost=30.57..30.63 rows=6 width=4)
Group Key: (unnest(foo.f1))
-> HashAggregate (cost=30.42..30.49 rows=6 width=4)
Group Key: unnest(foo.f1)
-> ProjectSet (cost=0.00..28.92 rows=600 width=4)
-> Seq Scan on foo (cost=0.00..25.88 rows=6 width=32)
Filter: (f2 = 1)
-> Index Scan using tenk1_unique2 on tenk1 (cost=0.29..8.30 rows=1 width=244)
Index Cond: (unique2 = (unnest(foo.f1)))
(10 rows)

Digging into it, the reason for the duplicate HashAggregate step was that
query_supports_distinctness() punted on SRFs-in-the-targetlist, basically
on the argument that it wasn't worth extra work to handle that case.
Thinking a bit harder, it seems to me that the correct analysis is:
1. If we are proving distinctness on the grounds of a DISTINCT clause,
then it doesn't matter whether there are any SRFs, because DISTINCT
removes duplicates after tlist SRF expansion.
2. But tlist SRFs break the ability to prove distinctness on the grounds
of GROUP BY, unless all of them are within grouping columns.
It still seems like detecting the second case is harder than it's worth,
but we can trivially handle the first case, with little more than some
code rearrangement.

The other problem is that the output rowcount of the sub-select (ie, of
the HashAggregate) is being estimated as though the SRF weren't there.
This turns out to be because estimate_num_groups() doesn't consider the
possibility of SRFs in the grouping columns. It never has, but in 9.6 and
before the problem was masked by the fact that grouping_planner scaled up
the result rowcount by tlist_returns_set_rows() *after* performing
grouping. Now we're effectively doing that in the other order, which is
more correct, but that means estimate_num_groups() has to apply some sort
of adjustment. I suggest that it just multiply its old estimate by the
maximum of the SRF expansion counts. That's likely to be an overestimate,
but it's really hard to do better without specific knowledge of the
individual SRF's behavior.

In short, I propose the attached fixes. I've checked this and it seems
to fix Dmitry's original problem according to the test case he sent
off-list.

regards, tom lane

Attachment Content-Type Size
improve-distinct-on-srf-estimates.patch text/x-diff 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2017-11-22 23:19:45 Re: [HACKERS] Transaction control in procedures
Previous Message Thomas Munro 2017-11-22 21:49:37 Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

Browse pgsql-performance by date

  From Date Subject
Next Message Henrik Cednert (Filmlance) 2017-11-23 09:26:16 Re: pg_dump 3 times as slow after 8.4 -> 9.5 upgrade
Previous Message Dmitry Shalashov 2017-11-22 21:34:47 Re: Query became very slow after 9.6 -> 10 upgrade