Hash join not finding which collation to use for string hashing

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Hash join not finding which collation to use for string hashing
Date: 2020-01-28 23:36:03
Message-ID: 38121352-48B2-401F-84F3-0E5344D834D8@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While reviewing the partition-wise join patch, I ran into an issue that exists in master, so rather than responding to that patch, I’m starting this new thread.

I noticed that this seems similar to the problem that was supposed to have been fixed in the "Re: COLLATE: Hash partition vs UPDATE” thread. As such, I’ve included Tom and Amit in the CC list.

Notice the "ERROR: could not determine which collation to use for string hashing”

The following is extracted from the output from the test:

> CREATE TABLE raw_data (a text);
> INSERT INTO raw_data (a) VALUES ('Türkiye'),
> ('TÜRKIYE'),
> ('bıt'),
> ('BIT'),
> ('äbç'),
> ('ÄBÇ'),
> ('aaá'),
> ('coté'),
> ('Götz'),
> ('ὀδυσσεύς'),
> ('ὈΔΥΣΣΕΎΣ'),
> ('を読み取り用'),
> ('にオープンできませんでした');
> -- Create unpartitioned tables for test
> CREATE TABLE alpha (a TEXT COLLATE "ja_JP", b TEXT COLLATE "sv_SE");
> CREATE TABLE beta (a TEXT COLLATE "tr_TR", b TEXT COLLATE "en_US");
> INSERT INTO alpha (SELECT a, a FROM raw_data);
> INSERT INTO beta (SELECT a, a FROM raw_data);
> ANALYZE alpha;
> ANALYZE beta;
> EXPLAIN (COSTS OFF)
> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE t1.a IN ('äbç', 'ὀδυσσεύς');
> QUERY PLAN
> ------------------------------------------------------------
> Hash Join
> Hash Cond: ((t2.a)::text = (t1.a)::text)
> -> Seq Scan on beta t2
> -> Hash
> -> Seq Scan on alpha t1
> Filter: (a = ANY ('{äbç,ὀδυσσεύς}'::text[]))
> (6 rows)
>
> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE t1.a IN ('äbç', 'ὀδυσσεύς');
> ERROR: could not determine which collation to use for string hashing
> HINT: Use the COLLATE clause to set the collation explicitly.
>


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-WIP-patch-to-demonstrate-problem.patch application/octet-stream 4.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2020-01-29 00:59:57 Re: Is custom MemoryContext prohibited?
Previous Message Tomas Vondra 2020-01-28 23:24:56 Re: Complete data erasure