[patch]HashJoin crash

From: Zhang Mingli <zmlpostgres(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [patch]HashJoin crash
Date: 2022-08-12 15:05:06
Message-ID: beb64ca0-91e2-44ac-bf4a-7ea36275ec02@Spark
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I got a coredump when using hash join on a Postgres derived Database(Greenplum DB).

And I find a way to reproduce it on Postgres.

Root cause:

In ExecChooseHashTableSize(), commit b154ee63bb uses func pg_nextpower2_size_t
whose param must not be 0.

```
sbuckets = pg_nextpower2_size_t(hash_table_bytes / bucket_size);

```

There is a potential risk that hash_table_bytes < bucket_size in some corner cases.

Reproduce sql:

```
--create a wide enough table to reproduce the bug
DO language 'plpgsql'
$$
DECLARE var_sql text := 'CREATE TABLE t_1600_columns('
 || string_agg('field' || i::text || ' varchar(255)', ',') || ');'
 FROM generate_series(1,1600) As i;
BEGIN
 EXECUTE var_sql;
END;
$$ ;

create table j1(field1 text);
set work_mem = 64;
set hash_mem_multiplier = 1;
set enable_nestloop = off;
set enable_mergejoin = off;

explain select * from j1 inner join t_1600_columns using(field1);

server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded

```

Part of  core dump file:

```
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=139769161559104) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=139769161559104) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=139769161559104, signo=signo(at)entry=6) at ./nptl/pthread_kill.c:89
#3 0x00007f1e8b3de476 in __GI_raise (sig=sig(at)entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00007f1e8b3c47f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x0000558cc8884062 in ExceptionalCondition (conditionName=0x558cc8a21570 "num > 0 && num <= PG_UINT64_MAX / 2 + 1",
 errorType=0x558cc8a21528 "FailedAssertion", fileName=0x558cc8a21500 "../../../src/include/port/pg_bitutils.h", lineNumber=165) at assert.c:69
#6 0x0000558cc843bb16 in pg_nextpower2_64 (num=0) at ../../../src/include/port/pg_bitutils.h:165
#7 0x0000558cc843d13a in ExecChooseHashTableSize (ntuples=100, tupwidth=825086, useskew=true, try_combined_hash_mem=false, parallel_workers=0,
 space_allowed=0x7ffdcfa01598, numbuckets=0x7ffdcfa01588, numbatches=0x7ffdcfa0158c, num_skew_mcvs=0x7ffdcfa01590) at nodeHash.c:835
```

This patch fixes it easily:

```
- sbuckets = pg_nextpower2_size_t(hash_table_bytes / bucket_size);
+ if (hash_table_bytes < bucket_size)
+   sbuckets = 1;
+ else
+   sbuckets = pg_nextpower2_size_t(hash_table_bytes / bucket_size);
```

Or, we could report an error/hit message to tell users to increase work_mem/hash_mem_multiplier.

And I think let it work is better.

The issue exists on master, 15, 14, 13.

Regards,
Zhang Mingli

Attachment Content-Type Size
vn-0001-Fix-HashJoin-crash.patch application/octet-stream 4.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Önder Kalacı 2022-08-12 15:11:55 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message Bruce Momjian 2022-08-12 14:30:52 Re: Doc about how to set max_wal_senders when setting minimal wal_level