Performance degradation in TPC-H Q18

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Performance degradation in TPC-H Q18
Date: 2017-02-22 05:53:07
Message-ID: CAGz5QC+=fNTYgzMLTBUNeKt6uaWZFXJbkB5+7oWm-n9DwVxcLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello everyone,

While conducting the experiments for parallelism, Rafia came across a
hang in Q18 when plan uses partial and finalize hash aggregate. This
could be seen on both scale factors - 20 and 300, on setting work_mem
high enough so that the query uses hash aggregate. It seems that
commit b81b5a96f424531b97cdd1dba97d9d1b9c9d372e does not solve the
issue completely.

I've tested on scale factor 10 and work_mem 100mb and the issue is
reproducible. Below is the simplified part of the query which uses the
hash aggregate.

tpch10GB=# explain select l_orderkey from lineitem group by l_orderkey;

QUERY PLAN
------------------------------------------------------------------------------------------------
Finalize HashAggregate (cost=1612872.73..1616801.32 rows=392859 width=8)
Group Key: l_orderkey
-> Gather (cost=1528408.04..1610908.43 rows=785718 width=8)
Workers Planned: 2
-> Partial HashAggregate (cost=1527408.04..1531336.63
rows=392859 width=8)
Group Key: l_orderkey
-> Parallel Seq Scan on lineitem
(cost=0.00..1464922.43 rows=24994243 width=8)
(7 rows)

I've also tried to execute the same query with a different column name.

tpch10GB=# explain analyze select l_suppkey from lineitem group by l_suppkey;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Finalize HashAggregate (cost=1549511.80..1550493.37 rows=98157 width=8)
(actual time=22363.901..22380.361 rows=100000 loops=1)
Group Key: l_suppkey
-> Gather (cost=1528408.04..1549021.01 rows=196314 width=8)
(actual time=22107.387..22244.848 rows=300000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial HashAggregate (cost=1527408.04..1528389.61
rows=98157 width=8)
(actual time=22100.776..22135.063 rows=100000 loops=3)
Group Key: l_suppkey
-> Parallel Seq Scan on lineitem
(cost=0.00..1464922.43 rows=24994243 width=8)
(actual time=0.846..10664.258 rows=19995351 loops=3)
Planning time: 0.171 ms
Execution time: 22393.030 ms
(10 rows)

And, it finished in time although the stats in both the plans are
exactly same. Hence, the problem is with the query itself. As Robert
suggested offline, I've produced the perf report for the query.

99.44% 0.01% postgres postgres [.] LookupTupleHashEntry
74.31% 53.55% postgres postgres [.] tuplehash_insert
17.05% 17.04% postgres libc-2.17.so [.] __memcpy_ssse3_back
11.06% 11.05% postgres postgres [.] tuplehash_prev
10.81% 10.80% postgres postgres [.] tuplehash_next

It's clear from the perf report that something terrible is going on in
the simple hash insert. I've enabled the logging for hash
ops(SH_STAT()) during hash growing phase.

2017-02-21 16:42:39.634 IST [9372] LOG: size: 1048576, members:
838860, filled: 0.799999,
total chain: 2008236860, max chain: 66743, avg chain: 2394.007176,
total_collisions: 278348, max_collisions: 8, avg_collisions: 0.331817
2017-02-21 16:42:39.634 IST [9372] STATEMENT: explain analyze select
l_orderkey from lineitem group by l_orderkey;

2017-02-21 16:42:39.741 IST [9372] LOG: size: 2097152, members:
838860, filled: 0.400000,
total chain: 788175, max chain: 124, avg chain: 0.939579,
total_collisions: 216052, max_collisions: 6, avg_collisions: 0.257554
2017-02-21 16:42:39.741 IST [9372] STATEMENT: explain analyze select
l_orderkey from lineitem group by l_orderkey;

The value of max chain length is really high which hampers the query's
performance. To further understand the scenario, I've taken the
following results:

tpch10GB=# select count(distinct l_orderkey) from lineitem;
count
----------
15000000

tpch10GB=# select correlation from pg_stats where tablename='lineitem'
and attname='l_orderkey';
correlation
-------------
1

tpch10GB=# select count(distinct l_suppkey) from lineitem;
count
--------
100000

tpch10GB=# select correlation from pg_stats where tablename='lineitem'
and attname='l_suppkey';
correlation
-------------
0.00900259

It seems that l_orderkey produces a pretty bad case for hashing with
linear probing and a bad choice of hash function(as linear probing is
sensitive to elements with nearby hash codes). In [1], Andres's
proposed a hack to resize the hashtable when the chain is growing
quickly. Below is a comment from the patch explaining the objective:

+ /*
+ * To avoid negative consequences from overly imbalanced
+ * hashtables, grow the hashtable if collisions lead to large
+ * runs. The most likely cause of such imbalance is filling a
+ * (currently) small table, from a currently big one, in
+ * hash-table order.
+ *
+ * FIXME: compute boundaries in a more principled manner.
+ */

After applying the patch, TPC-H Q18 and the simplified query, both
completes in time. Although the patch works in this scenario, we need
a smarter way to trigger the hash expansion. The problem with the
patch is that it triggers a hash expansion even when the filled
percentage is pretty low, causing unnecessary memory consumption.

Thoughts?

[1] https://www.postgresql.org/message-id/20161123083351.5vramz52nmdokhzz%40alap3.anarazel.de
0001-Resize-simplehash.h-table-in-case-of-long-runs.patch

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-02-22 05:55:53 Re: Change in "policy" on dump ordering?
Previous Message Jim Nasby 2017-02-22 05:43:16 Re: mat views stats