Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Bug? ExecChooseHashTableSize() got assertion failed with crazy number of rows
Date: 2015-08-18 14:02:44
Message-ID: 9A28C8860F777E439AA12E8AEA7694F8011347B1@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I noticed ExecChooseHashTableSize() in nodeHash.c got failed by
Assert(nbuckets > 0), when crazy number of rows are expected.

BACKTRACE:

#0 0x0000003f79432625 in raise () from /lib64/libc.so.6
#1 0x0000003f79433e05 in abort () from /lib64/libc.so.6
#2 0x000000000092600a in ExceptionalCondition (conditionName=0xac1ea0 "!(nbuckets > 0)",
errorType=0xac1d88 "FailedAssertion", fileName=0xac1d40 "nodeHash.c", lineNumber=545) at assert.c:54
#3 0x00000000006851ff in ExecChooseHashTableSize (ntuples=60521928028, tupwidth=8, useskew=1 '\001',
numbuckets=0x7fff146bff04, numbatches=0x7fff146bff00, num_skew_mcvs=0x7fff146bfefc) at nodeHash.c:545
#4 0x0000000000701735 in initial_cost_hashjoin (root=0x253a318, workspace=0x7fff146bffc0, jointype=JOIN_SEMI,
hashclauses=0x257e4f0, outer_path=0x2569a40, inner_path=0x2569908, sjinfo=0x2566f40, semifactors=0x7fff146c0168)
at costsize.c:2592
#5 0x000000000070e02a in try_hashjoin_path (root=0x253a318, joinrel=0x257d940, outer_path=0x2569a40, inner_path=0x2569908,
hashclauses=0x257e4f0, jointype=JOIN_SEMI, extra=0x7fff146c0150) at joinpath.c:543

See the following EXPLAIN output, at the configuration without --enable-cassert.
Planner expects 60.5B rows towards the self join by a relation with 72M rows.
(Probably, this estimation is too much.)

[kaigai(at)ayu ~]$ (echo EXPLAIN; cat ~/tpcds/query95.sql) | psql tpcds100
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=9168667273.07..9168667273.08 rows=1 width=20)
CTE ws_wh
-> Custom Scan (GpuJoin) (cost=3342534.49..654642911.88 rows=60521928028 width=24)
Bulkload: On (density: 100.00%)
Depth 1: Logic: GpuHashJoin, HashKeys: (ws_order_number), JoinQual: ((ws_warehouse_sk <> ws_warehouse_sk) AND (ws_order_number = ws_order_number)), nrows (ratio: 84056.77%)
-> Custom Scan (BulkScan) on web_sales ws1_1 (cost=0.00..3290612.48 rows=72001248 width=16)
-> Seq Scan on web_sales ws2 (cost=0.00..3290612.48 rows=72001248 width=16)
-> Sort (cost=8514024361.19..8514024361.20 rows=1 width=20)
Sort Key: (count(DISTINCT ws1.ws_order_number))
:

This crash was triggered by Assert(nbuckets > 0), and nbuckets is calculated
as follows.

/*
* If there's not enough space to store the projected number of tuples and
* the required bucket headers, we will need multiple batches.
*/
if (inner_rel_bytes + bucket_bytes > hash_table_bytes)
{
/* We'll need multiple batches */
long lbuckets;
double dbatch;
int minbatch;
long bucket_size;

/*
* Estimate the number of buckets we'll want to have when work_mem is
* entirely full. Each bucket will contain a bucket pointer plus
* NTUP_PER_BUCKET tuples, whose projected size already includes
* overhead for the hash code, pointer to the next tuple, etc.
*/
bucket_size = (tupsize * NTUP_PER_BUCKET + sizeof(HashJoinTuple));
lbuckets = 1 << my_log2(hash_table_bytes / bucket_size);
lbuckets = Min(lbuckets, max_pointers);
nbuckets = (int) lbuckets;
bucket_bytes = nbuckets * sizeof(HashJoinTuple);
:
:
}
Assert(nbuckets > 0);
Assert(nbatch > 0);

In my case, the hash_table_bytes was 101017630802, and bucket_size was 48.
So, my_log2(hash_table_bytes / bucket_size) = 31, then lbuckets will have
negative number because both "1" and my_log2() is int32.
So, Min(lbuckets, max_pointers) chooses 0x80000000, then it was set on
the nbuckets and triggers the Assert().

Attached patch fixes the problem.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Attachment Content-Type Size
pgsql-fix-hash-nbuckets.patch application/octet-stream 662 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-08-18 14:04:23 Re: DTrace build dependency rules
Previous Message Andrew Dunstan 2015-08-18 13:52:10 Re: jsonb array-style subscripting