Re: Our trial to TPC-DS but optimizer made unreasonable plan

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Our trial to TPC-DS but optimizer made unreasonable plan
Date: 2015-08-20 01:08:57
Message-ID: 9A28C8860F777E439AA12E8AEA7694F80113582A@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mon, Aug 17, 2015 at 9:40 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> > I think SortSupport logic provides a reasonable way to solve this
> > kind of problem. For example, btint4sortsupport() informs a function
> > pointer of the fast version of comparator (btint4fastcmp) which takes
> > two Datum argument without indirect memory reference.
> > This mechanism will also make sense for HashAggregate logic, to reduce
> > the cost of function invocations.
> >
> > Please comment on the idea I noticed here.
>
> It's possible that this can work, but it might be a good idea to run
> 'perf' on this query and find out where the CPU time is actually
> going. That might give you a clearer picture of why the HashAggregate
> is slow.
>
I tried to run one of CTE portion under the perf enabled.

HashAggregate still takes 490sec in spite of 70sec by underlying Join.

tpcds100=# explain analyze select c_customer_id customer_id
,c_first_name customer_first_name
,c_last_name customer_last_name
,c_preferred_cust_flag customer_preferred_cust_flag
,c_birth_country customer_birth_country
,c_login customer_login
,c_email_address customer_email_address
,d_year dyear
,sum(((ss_ext_list_price-ss_ext_wholesale_cost-ss_ext_discount_amt)+ss_ext_sales_price)/2) year_total
,'s' sale_type
from customer
,store_sales
,date_dim
where c_customer_sk = ss_customer_sk
and ss_sold_date_sk = d_date_sk
group by c_customer_id
,c_first_name
,c_last_name
,c_preferred_cust_flag
,c_birth_country
,c_login
,c_email_address
,d_year
;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=18194948.40..21477516.00 rows=262605408 width=178)
(actual time=483480.161..490763.640 rows=9142442 loops=1)
Group Key: customer.c_customer_id, customer.c_first_name, customer.c_last_name, customer.c_preferred_cust_flag,
customer.c_birth_country, customer.c_login, customer.c_email_address, date_dim.d_year
-> Custom Scan (GpuJoin) (cost=101342.54..9660272.64 rows=262605408 width=178)
(actual time=2430.787..73116.553 rows=268562375 loops=1)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys: (ss_sold_date_sk), JoinQual: (ss_sold_date_sk = d_date_sk),
nrows (287997024 -> 275041999, 95.50% expected 95.47%)
Depth 2: Logic: GpuHashJoin, HashKeys: (ss_customer_sk), JoinQual: (ss_customer_sk = c_customer_sk),
nrows (275041999 -> 268562375, 93.25% expected 91.18%)
-> Custom Scan (BulkScan) on store_sales (cost=0.00..9649559.60 rows=287996960 width=38)
(actual time=17.141..52757.354 rows=287997024 loops=1)
-> Seq Scan on date_dim (cost=0.00..2705.49 rows=73049 width=16)
(actual time=0.030..20.597 rows=73049 loops=1)
-> Seq Scan on customer (cost=0.00..87141.74 rows=2000074 width=156)
(actual time=0.010..585.861 rows=2000000 loops=1)
Planning time: 1.558 ms
Execution time: 492113.558 ms
(11 rows)

Perf output is below. Unlike my expectation, the largest portion was consumed
by bpchareq(6.76%) + bcTruelen(8.23%). One other big cluster is, probabaly,
TupleHashTableHash(1.11%) -> slot_getattr(4.29%) -> slot_deform_tuple(4.92%).

# ========
# captured on: Thu Aug 20 09:52:24 2015
# hostname : ayu.kaigai.gr.jp
# os release : 2.6.32-504.23.4.el6.x86_64
# perf version : 2.6.32-504.23.4.el6.x86_64.debug
# arch : x86_64
# nrcpus online : 48
# nrcpus avail : 48
# cpudesc : Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
# cpuid : GenuineIntel,6,63,2
# total memory : 396795400 kB
# cmdline : /usr/bin/perf record -a -e cycles
# event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0, excl_guest = 1, precise_ip = 0, attr_mmap2 = 0, attr_mmap = 1, attr_mmap_data = 0
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, tracepoint = 2, software = 1
# ========
#
# Samples: 2M of event 'cycles'
# Event count (approx.): 1558291468259
#
# Overhead Command Shared Object Symbol
# ........ ............... .......................... .....................................................
#
8.23% postgres postgres [.] bcTruelen
6.76% postgres postgres [.] bpchareq
4.92% postgres postgres [.] pg_detoast_datum
4.29% postgres postgres [.] slot_getattr
4.07% postgres postgres [.] AllocSetAlloc
3.58% postgres postgres [.] slot_deform_tuple
3.39% postgres postgres [.] div_var
3.35% postgres postgres [.] hash_search_with_hash_value
3.11% postgres postgres [.] hash_any
2.62% postgres postgres [.] make_result
2.50% postgres postgres [.] add_abs
2.24% postgres postgres [.] ExecAgg
2.23% postgres postgres [.] init_var_from_num
2.09% postgres postgres [.] pg_detoast_datum_packed
2.07% postgres postgres [.] ExecMakeFunctionResultNoSets
1.95% postgres [vsyscall] [.] 0x000000000000014c
1.88% postgres libc-2.12.so [.] memcpy
1.83% postgres postgres [.] execTuplesMatch
1.71% postgres postgres [.] sub_abs
1.70% postgres [kernel.kallsyms] [k] copy_user_generic_string
1.48% postgres pg_strom.so [.] pgstrom_data_store_insert_block
1.41% postgres postgres [.] palloc
1.38% postgres postgres [.] texteq
1.29% postgres [vdso] [.] 0x0000000000000890
1.11% postgres postgres [.] TupleHashTableHash
:
(only larger than 1.0%)

Indeed, 6 of 8 grouping keys in this query uses bpchar() data type, so it is
natural comparison function consumed larger portion of CPU cycles.
Do we have any idea to assist these queries by the backend?

tpcds100=# \d customer
Table "public.customer"
Column | Type | Modifiers
------------------------+-----------------------+-----------
c_customer_sk | bigint | not null
c_customer_id | character(16) | not null
c_current_cdemo_sk | bigint |
c_current_hdemo_sk | bigint |
c_current_addr_sk | bigint |
c_first_shipto_date_sk | bigint |
c_first_sales_date_sk | bigint |
c_salutation | character(10) |
c_first_name | character(20) |
c_last_name | character(30) |
c_preferred_cust_flag | character(1) |
c_birth_day | bigint |
c_birth_month | bigint |
c_birth_year | bigint |
c_birth_country | character varying(20) |
c_login | character(13) |
c_email_address | character(50) |
c_last_review_date_sk | bigint |

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-08-20 01:21:41 Re: Our trial to TPC-DS but optimizer made unreasonable plan
Previous Message Marko Tiikkaja 2015-08-20 00:36:33 Re: Supporting fallback RADIUS server(s)