Hashed SAOP on composite type with non-hashable column errors at runtime

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Hashed SAOP on composite type with non-hashable column errors at runtime
Date: 2026-06-05 14:27:00
Message-ID: ed221f95-f09b-4a9c-b05b-e1fed621ec87@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

There is an issue when we use a record-based array operation in SQL:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM test
WHERE (a,b) = ANY (ARRAY[
(1, 'w1'::tsvector), (2, 'w2'::tsvector), (3, 'w3'::tsvector),
(4, 'w4'::tsvector), (5, 'w5'::tsvector), (6, 'w6'::tsvector),
(7, 'w7'::tsvector), (8, 'w8'::tsvector), (9, 'w9'::tsvector)
]);
ERROR: could not identify a hash function for type tsvector

See the attachment for the full reproduction script.
This happens because the hashability check for the record and array types misses
the op_hashjoinable() test. With fewer than 9 elements the query executes
successfully.

Patch 0001 (attached) fixes this bug. It is a natural follow-up to 17da9d4c282,
the hashing of record types itself was introduced by 01e658fa74c. It deserves a
back-patch down to v14.

More interesting is that EXPLAIN doesn't expose whether the executor used the
hashed or the plain search strategy. That might be acceptable, since we know
hashing is always used from nine elements on. But it forces the user first to
read the source code, and then to inspect the catalog, to find out whether the
clause has a hash function. For a SubPlan we do have this information — so let's
take a look at v0-0002, which introduces a 'hashed' flag.

It would be too prosaic a bug fix if there weren't a nice corner case with the
anonymous record type. Consider the following:

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT count(*) FROM (SELECT g x, -g y FROM generate_series(1,300000) g) t
WHERE (x, y) = ANY
(array[(1,-1),(2,-2),(1,-1),(2,-2),(1,-1),(2,-2),(1,-1),(2,-2),(64,-64)]);

/*
-- Before the fix:
Aggregate (actual rows=1.00 loops=1)
Buffers: shared hit=63 read=5, temp read=513 written=513
-> Function Scan on generate_series g (actual rows=3.00 loops=1)
Filter: (ROW(g, (- g)) = ANY
('{"(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(64,-64)"}'::record[]))
Rows Removed by Filter: 299997
Buffers: shared hit=63 read=5, temp read=513 written=513
Planning:
Buffers: shared hit=45 read=16
Planning Time: 2.923 ms
Execution Time: 62.969 ms
(10 rows)

-- After the fix:
Aggregate (actual rows=1.00 loops=1)
Buffers: shared hit=42, temp read=513 written=513
-> Function Scan on generate_series g (actual rows=3.00 loops=1)
Filter: (ROW(g, (- g)) = ANY
('{"(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(1,-1)","(2,-2)","(64,-64)"}'::record[]))
Rows Removed by Filter: 299997
Buffers: shared hit=42, temp read=513 written=513
Planning:
Buffers: shared hit=88
Planning Time: 0.837 ms
Execution Time: 745.897 ms
(10 rows)
*/

You can see a regression here: a legitimate hashed SAOP is no longer hashed. The
fix for that is not so simple — we have to check every element of the array
before deciding whether the hashing strategy is possible. This is quite an
expensive operation, so I sketched a solution in patch 0003, but I'm not sure it
is worth developing: checking an anonymous type might simply be too expensive.
Should it be done only once, conditionally, with a size limit and result caching?

--
regards, Andrei Lepikhov,
pgEdge

Attachment Content-Type Size
v0-0001-Don-t-hash-a-record-array-SAOP-whose-input-type-i.patch text/plain 6.8 KB
v0-0002-Show-hashed-ScalarArrayOpExpr-decision-in-EXPLAIN.patch text/plain 3.2 KB
v0-0003-Recover-hashed-SAOP-for-anonymous-records-with-ha.patch text/plain 9.4 KB
bug-hashed-saop.sql text/plain 1.1 KB

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Fujii Masao 2026-06-05 15:15:57 Re: BUG #19511: contrib/dblink: NULL dereference in dblink_get_notify() when called without a prior connection
Previous Message Bruce Momjian 2026-06-05 14:10:01 Re: BUG #19512: PG 17.10: SIGSEGV in build_minmax_path (planner) and hash_bytes (HashAgg executor)