BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: xinyuliu(at)umich(dot)edu
Subject: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function
Date: 2020-09-19 03:32:41
Message-ID: 16625-07403f18463add24@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16625
Logged by: XINYU LIU
Email address: xinyuliu(at)umich(dot)edu
PostgreSQL version: 12.3
Operating system: Ubuntu 19.10
Description:

Hello,

We are developing a tool for automatically finding performance bugs in
PostgreSQL. Our key insight is that given a pair of semantic equivalent
queries, a robust DBMS should return the same result within a similar
execution time. Significant time difference suggests a potential performance
bug in the DBMS.

We are sharing a pair of TPC-H queries that exhibit a potential performance
bug in this report:

First query:
SELECT "s_suppkey"
FROM "supplier"
WHERE s_suppkey > 100;

Second query:
SELECT "s_suppkey"
FROM "supplier"
WHERE s_suppkey > 100
GROUP BY s_suppkey;

[Actual Behavior]
We executed both queries on the TPC-H benchmark of scale factor 5: the first
query takes only 17 millisecond, while the second query takes 42
millisecond. We think the time difference results from different plans
selected.

[Query Execution Plan]
First query:
QUERY PLAN

--------------------------------------------------------------------------------------------------------------
Seq Scan on supplier (cost=0.00..1734.00 rows=49909 width=4) (actual
time=0.047..14.898 rows=49900 loops=1)
Filter: (s_suppkey > 100)
Rows Removed by Filter: 100
Planning Time: 0.639 ms
Execution Time: 17.469 ms
(5 rows)

Second query:
QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1858.77..2357.86 rows=49909 width=4) (actual
time=30.093..38.541 rows=49900 loops=1)
Group Key: s_suppkey
-> Seq Scan on supplier (cost=0.00..1734.00 rows=49909 width=4) (actual
time=0.047..13.656 rows=49900 loops=1)
Filter: (s_suppkey > 100)
Rows Removed by Filter: 100
Planning Time: 0.669 ms
Execution Time: 42.270 ms
(7 rows)

[Expected Behavior]
Since these two queries are semantically equivalent, we were hoping that
PostgreSQL will return the same results in roughly the same amount of
time.

[Test Environment]
Ubuntu 19.10
PostgreSQL v12.3
Database: TPC-H benchmark (with scale factor 5)

[Steps for reproducing our observations]

* Download the dataset from the link:
https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing
* Set up TPC-H benchmark
tar xzvf tpch5_postgresql.tar.gz
cd tpch5_postgresql
db=tpch5
createdb $db
psql -d $db < dss.ddl
for i in `ls *.tbl`
do
echo $i
name=`echo $i|cut -d'.' -f1`
psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING
'LATIN1';"
done
psql -d $db < dss_postgres.ri
*Execute the queries

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Christophe Pettus 2020-09-19 15:57:24 Re: BUG #16625: Query Optimizer - Performance bug related to removal of unnecessary aggregate function
Previous Message Tom Lane 2020-09-18 21:22:30 Re: BUG #16624: Query Optimizer - Performance bug related to predicate simplification