| From: | Alexander Nestorov <alexandernst(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | [PATCH] btree_gist: add cross-type integer operator support for GiST |
| Date: | 2026-05-02 17:26:43 |
| Message-ID: | 36b4f67d-5975-452c-a6b8-b6407f0924ee@Spark |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello hackers,
I'd like to submit a patch that adds cross-type operator support for the
three integer types (int2, int4, int8) to the btree_gist GiST operator
families. The patch also lays a general foundation for cross-type
dispatch that other type families can adopt.
Current problem:
GiST indexes are currently limited to same-type operators: the planner
can only match a query condition against an index column if the
operator's left and right argument types exactly match the types
registered in the index's operator family (pg_amop). This is true for
every data type that backs a GiST opclass.
When a query provides a value whose type is compatible but different
from the column type, the planner cannot use the index for that column.
The operator family lookup fails because no cross-type entry exists.
The planner must then either fall back to a sequential scan,
or in multi-column indexes use only the remaining column(s) and filter
the rest as a post-filter.
As a specific example, consider an int8 (bigint) column indexed with
btree_gist. The query:
SELECT * FROM t WHERE camera_id = 1189;
Here 1189 is evaluated as int4 (integer). The planner resolves the
operator to =(int8,int4), which exists in pg_operator but is not
registered in the gist_int8_ops family. Only =(int8,int8) is
registered, so the column cannot be used as an index condition.
The workaround is to write an explicit cast in every query:
WHERE camera_id = 1189::int8
This is fragile as ORMs, application parameter binding, and even hand-
written SQL queries produce values whose types do not exactly match
the column type.
A couple of self-contained reproduction scripts are included at the
end of this email. They create a table with a GiST index, insert
enough rows to make the plan difference visible, and run
EXPLAIN ANALYZE.
Proposed solution:
I'm working on a patch that extends the three btree_gist integer
operator families (gist_int2_ops, gist_int4_ops, gist_int8_ops)
with cross-type comparison and KNN-distance operators covering the
other two integer types. Concretely:
gist_int2_ops <- operators for (int2, int4) and (int2, int8)
gist_int4_ops <- operators for (int4, int2) and (int4, int8)
gist_int8_ops <- operators for (int8, int2) and (int8, int4)
For each pair I add all the standard btree_gist strategies for the
comparison operators (<, <=, =, >=, >, <>), plus the strategy for
the KNN distance operator (<->) used by ORDER BY.
I propose not to register separate cross-type support functions in
the operator family. GiST's amvalidate requires every support
function to have a matching left and right type, and registering
18 additional support functions (three families x two subtypes x
three strategies) would be verbose and error-prone. Instead, I
propose to dispatch cross-type queries directly inside the existing
consistent and distance functions and use the existing subtype
OID argument.
I'm thinking of introducing a general-purpose cross-type dispatch
table in btree_utils_num:
typedef struct gbt_subtype_info
{
Oid subtype; /* right-hand Oid, e.g. INT4OID */
gbt_cmp_fn lt, le, eq, ge, gt; /* comparison callbacks */
gbt_dist_fn dist; /* KNN distance callback */
} gbt_subtype_info;
Each integer opclass defines a static array of these entries:
static const gbt_subtype_info gbt_int2_subtype_ops[] = {
{INT4OID, ... 6 comparison fns ... , distance fn},
{INT8OID, ... 6 comparison fns ... , distance fn},
{InvalidOid}
};
I'll replace the existing gbt_num_consistent() function with
gbt_num_consistent_x(), which takes a Datum query value and a
subtype Oid. If the subtype is InvalidOid or matches the indexed
type, the same-type path is used (backward compatible). Otherwise,
the function walks the dispatch table, finds the matching subtype
entry, and invokes the corresponding cross-type comparison or
distance callback.
Other btree_gist opclasses (float4/float8, date, timestamp, ...) and
even range-type GiST opfamilies in core can adopt the same pattern by
defining their own subtype dispatch tables and registering cross-type
operators via ALTER OPERATOR FAMILY.
I don't plan adding cross-type support for every data type as that
would result in a very bulky patch difficult to review, but I think
my proposal establishes the infrastructure so that follow-up work
for additional types is straightforward.
Is there interest in this patch? Should I proceed with polishing my
patch and sending it here for a review? I appreciate any feedback!
Thank you
Simple repro example:
DROP TABLE IF EXISTS camera_feeds_simple CASCADE;
CREATE TABLE camera_feeds_simple (
id SERIAL PRIMARY KEY,
camera_id int8 NOT NULL
);
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE INDEX idx_camera_feeds_gist
ON camera_feeds_simple
USING GIST (camera_id);
SET enable_seqscan = OFF;
SET enable_bitmapscan = ON;
INSERT INTO camera_feeds_simple (camera_id) VALUES
(1), (1), (1), (1), (2), (2), (2), (2), (3), (3), (3),
(4), (4), (4), (4), (4), (4), (4), (4), (5), (5), (6), (6);
ANALYZE camera_feeds_simple;
\echo 'Triggering current behaviour that fails to use the index'
EXPLAIN (ANALYZE, COSTS, BUFFERS, TIMING, SUMMARY)
SELECT *
FROM camera_feeds_simple
WHERE camera_id = 4;
\echo 'Triggering query with manual cast workaround'
EXPLAIN (ANALYZE, COSTS, BUFFERS, TIMING, SUMMARY)
SELECT *
FROM camera_feeds_simple
WHERE camera_id = 4::int8;
RESET enable_seqscan;
RESET enable_bitmapscan;
Repro example with two columns:
DROP TABLE IF EXISTS camera_feeds_multi CASCADE;
CREATE TABLE camera_feeds_multi (
id SERIAL PRIMARY KEY,
camera_id int8 NOT NULL,
timerange TSTZRANGE NOT NULL
);
CREATE EXTENSION IF NOT EXISTS btree_gist;
CREATE INDEX idx_camera_feeds_gist
ON camera_feeds_multi
USING GIST (camera_id, timerange);
SET enable_seqscan = OFF;
SET enable_bitmapscan = ON;
INSERT INTO camera_feeds_multi (camera_id, timerange) VALUES
(1, '[2026-04-01 06:00:00+00, 2026-04-01 12:00:00+00)'),
(1, '[2026-04-01 12:00:00+00, 2026-04-01 18:00:00+00)'),
(1, '[2026-04-01 18:00:00+00, 2026-04-02 06:00:00+00)'),
(1, '[2026-04-02 06:00:00+00, 2026-04-02 12:00:00+00)'),
(2, '[2026-04-01 07:00:00+00, 2026-04-01 09:00:00+00)'),
(2, '[2026-04-01 08:30:00+00, 2026-04-01 11:00:00+00)'),
(2, '[2026-04-01 10:00:00+00, 2026-04-01 14:00:00+00)'),
(2, '[2026-04-01 13:00:00+00, 2026-04-01 17:00:00+00)'),
(3, '[2026-03-15 00:00:00+00, 2026-03-15 23:59:59+00)'),
(3, '[2026-03-20 08:00:00+00, 2026-03-20 20:00:00+00)'),
(3, '[2026-04-01 00:00:00+00, 2026-04-01 23:59:59+00)'),
(4, '[2026-04-01 00:00:00+00, 2026-04-01 06:00:00+00)'),
(4, '[2026-04-01 06:00:00+00, 2026-04-01 08:00:00+00)'),
(4, '[2026-04-01 08:00:00+00, 2026-04-01 12:00:00+00)'),
(4, '[2026-04-01 12:00:00+00, 2026-04-01 14:00:00+00)'),
(4, '[2026-04-01 14:00:00+00, 2026-04-01 18:00:00+00)'),
(4, '[2026-04-01 18:00:00+00, 2026-04-01 22:00:00+00)'),
(4, '[2026-04-01 22:00:00+00, 2026-04-02 00:00:00+00)'),
(4, '[2026-04-01 09:00:00+00, 2026-04-01 15:00:00+00)'),
(5, '[2026-03-30 00:00:00+00, 2026-04-02 00:00:00+00)'),
(5, '[2026-04-01 10:00:00+00, 2026-04-01 10:30:00+00)'),
(6, '[2026-04-01 12:00:00+00, 2026-04-01 12:00:00+00)'),
(6, '[2026-04-01 12:00:00+00, 2026-04-01 13:00:00+00)');
ANALYZE camera_feeds_multi;
\echo 'Triggering current behaviour that fails to use the index'
EXPLAIN (ANALYZE, COSTS, BUFFERS, TIMING, SUMMARY)
SELECT *
FROM camera_feeds_multi
WHERE camera_id = 4
AND timerange && '[2026-04-01 10:00:00+00, 2026-04-01 16:00:00+00)'::tstzrange
ORDER BY timerange;
\echo 'Triggering query with manual cast workaround'
EXPLAIN (ANALYZE, COSTS, BUFFERS, TIMING, SUMMARY)
SELECT *
FROM camera_feeds_multi
WHERE camera_id = 4::int8
AND timerange && '[2026-04-01 10:00:00+00, 2026-04-01 16:00:00+00)'::tstzrange
ORDER BY timerange;
RESET enable_seqscan;
RESET enable_bitmapscan;
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jim Jones | 2026-05-02 17:32:30 | Re: Fix bug with accessing to temporary tables of other sessions |
| Previous Message | Alexander Korotkov | 2026-05-02 16:51:52 | Re: Two issues leading to discrepancies in FSM data on the standby server |