Re: DBT-3 with SF=20 got failed

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: DBT-3 with SF=20 got failed
Date: 2015-08-19 11:55:40
Message-ID: CADyhKSWnRTvVzpgFXDrQ5sgvY4E+kWor7Qfj4qCt2VrFDEs+0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2015-08-19 20:12 GMT+09:00 Simon Riggs <simon(at)2ndquadrant(dot)com>:
> On 12 June 2015 at 00:29, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
>>
>> I see two ways to fix this:
>>
>> (1) enforce the 1GB limit (probably better for back-patching, if that's
>> necessary)
>>
>> (2) make it work with hash tables over 1GB
>>
>> I'm in favor of (2) if there's a good way to do that. It seems a bit
>> stupid not to be able to use fast hash table because there's some artificial
>> limit. Are there any fundamental reasons not to use the
>> MemoryContextAllocHuge fix, proposed by KaiGai-san?
>
>
> If there are no objections, I will apply the patch for 2) to HEAD and
> backpatch to 9.5.
>
Please don't be rush. :-)

It is not difficult to replace palloc() by palloc_huge(), however, it may lead
another problem once planner gives us a crazy estimation.
Below is my comment on the another thread.

==========
Also, we may need to pay attention to reliability of scale estimation
by planner.
Even though the plan below says that Join generates 60521928028 rows,
it actually generates 776157676 rows (0.12%).

tpcds100=# EXPLAIN ANALYZE select
ws1.ws_order_number,ws1.ws_warehouse_sk wh1,ws2.ws_warehouse_sk wh2
from web_sales ws1,web_sales ws2
where ws1.ws_order_number = ws2.ws_order_number
and ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=25374644.08..1160509591.61 rows=60521928028
width=24) (actual time=138347.979..491889.343 rows=776157676 loops=1)
Merge Cond: (ws1.ws_order_number = ws2.ws_order_number)
Join Filter: (ws1.ws_warehouse_sk <> ws2.ws_warehouse_sk)
Rows Removed by Join Filter: 127853313
-> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=73252.300..79017.420 rows=72001237 loops=1)
Sort Key: ws1.ws_order_number
Sort Method: quicksort Memory: 7083296kB
-> Seq Scan on web_sales ws1 (cost=0.00..3290612.48
rows=72001248 width=16) (actual time=0.023..39951.201 rows=72001237
loops=1)
-> Sort (cost=12687322.04..12867325.16 rows=72001248 width=16)
(actual time=65095.655..128885.811 rows=904010978 loops=1)
Sort Key: ws2.ws_order_number
Sort Method: quicksort Memory: 7083296kB
-> Seq Scan on web_sales ws2 (cost=0.00..3290612.48
rows=72001248 width=16) (actual time=0.014..31046.888 rows=72001237
loops=1)
Planning time: 0.232 ms
Execution time: 530176.521 ms
(14 rows)

So, even if we allows nodeHash.c to allocate hash buckets larger than
1GB, its initial size may be determined carefully.
Probably, 1GB is a good starting point even if expanded later.

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2015-08-19 11:59:44 Re: Declarative partitioning
Previous Message Simon Riggs 2015-08-19 11:12:34 Re: DBT-3 with SF=20 got failed