BUG #15722: 9000x performance regression on query starting PostgreSQL 10 due to bad query plan

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: jeremyevans0(at)gmail(dot)com
Subject: BUG #15722: 9000x performance regression on query starting PostgreSQL 10 due to bad query plan
Date: 2019-03-29 01:14:11
Message-ID: 15722-97031b8087519419@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15722
Logged by: Jeremy Evans
Email address: jeremyevans0(at)gmail(dot)com
PostgreSQL version: 11.2
Operating system: Linux
Description:

For one complex query, we are seeing a huge performance regression starting
with PostgreSQL 10. We noticed this regression starting with an upgrade
from 9.3 to 11. We then tested on PostgreSQL 9.3, 9.4, 9.5, 9.6, 10, and 11
with this query by restoring a database dump and running the query. On
PostgreSQL 9.3-9.6, this query runs in a few seconds. In PostgreSQL 10 and
11 it takes multiple hours.

We tried manually running VACUUM and ANALYZE commands before running the
query, with no significant effect. As the restoration of the database dump
was done just before running this query, there should be no need to REINDEX.
This analysis was repeated on a separate physical server with a different
hardware configuration but similar PostgreSQL configuration, also on 9.3,
9.4, 9.5, 9.6, 10, and 11, with the same results.

The cause of this problem seems to be PostgreSQL underestimating the number
of rows needed by a hash join by almost 9 million times. On PostgreSQL
10-11, the expensive part of the query is a Nested Loop Inner Join of a Hash
Inner Join and a Index Scan. On PostgreSQL 9.3-9.6, this part of the query
uses a Nested Loop Semi Join of the Hash Inner Join and Index Scan.

The following files are available at
https://gist.github.com/jeremyevans/8a9693b1d3f14c94f635b1bf4275b2be

* Raw SQL Query
* Formatted SQL Query
* Database Schema Dump
* EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) output for:
* 9.3
* 9.4
* 9.5
* 9.6
* 10
* 11

Due to the consistently good behavior in 9.3-9.6, and the severity of the
performance decrease in 10-11, I think this is most likely a regression in
PostgreSQL's query planner or optimizer, which is why I'm posting this as a
bug report. If this is not considered a bug, please let me know and I can
post to pgsql-performance instead.

Any and all help greatly appreciated. If more information would be helpful,
please let me know.

Thanks,
Jeremy

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-03-29 08:54:38 BUG #15723: Export fails 'worker process died unexpectedly' but no error in pg_dump
Previous Message Sergei Kornilov 2019-03-28 18:47:20 Re: BUG #15721: FATAL: dsa_allocate could not find 97 free pages