[PATCH] btree_gist: add cross-type integer operator support for GiST

From: Alexander Nestorov <alexandernst(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [PATCH] btree_gist: add cross-type integer operator support for GiST
Date: 2026-05-02 17:26:43
Message-ID: 36b4f67d-5975-452c-a6b8-b6407f0924ee@Spark
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

I'd like to submit a patch that adds cross-type operator support for the
three integer types (int2, int4, int8) to the btree_gist GiST operator
families. The patch also lays a general foundation for cross-type
dispatch that other type families can adopt.

Current problem:

GiST indexes are currently limited to same-type operators: the planner
can only match a query condition against an index column if the
operator's left and right argument types exactly match the types
registered in the index's operator family (pg_amop). This is true for
every data type that backs a GiST opclass.

When a query provides a value whose type is compatible but different
from the column type, the planner cannot use the index for that column.
The operator family lookup fails because no cross-type entry exists.
The planner must then either fall back to a sequential scan,
or in multi-column indexes use only the remaining column(s) and filter
the rest as a post-filter.

As a specific example, consider an int8 (bigint) column indexed with
btree_gist. The query:

    SELECT * FROM t WHERE camera_id = 1189;

Here 1189 is evaluated as int4 (integer). The planner resolves the
operator to =(int8,int4), which exists in pg_operator but is not
registered in the gist_int8_ops family. Only =(int8,int8) is
registered, so the column cannot be used as an index condition.

The workaround is to write an explicit cast in every query:

    WHERE camera_id = 1189::int8

This is fragile as ORMs, application parameter binding, and even hand-
written SQL queries produce values whose types do not exactly match
the column type.

A couple of self-contained reproduction scripts are included at the
end of this email. They create a table with a GiST index, insert
enough rows to make the plan difference visible, and run
EXPLAIN ANALYZE.

Proposed solution:

I'm working on a patch that extends the three btree_gist integer
operator families (gist_int2_ops, gist_int4_ops, gist_int8_ops)
with cross-type comparison and KNN-distance operators covering the
other two integer types.  Concretely:

    gist_int2_ops   <-  operators for (int2, int4) and (int2, int8)
    gist_int4_ops   <-  operators for (int4, int2) and (int4, int8)
    gist_int8_ops   <-  operators for (int8, int2) and (int8, int4)

For each pair I add all the standard btree_gist strategies for the
comparison operators (<, <=, =, >=, >, <>), plus the strategy for
the KNN distance operator (<->) used by ORDER BY.

I propose not to register separate cross-type support functions in
the operator family. GiST's amvalidate requires every support
function to have a matching left and right type, and registering
18 additional support functions (three families x two subtypes x
three strategies) would be verbose and error-prone. Instead, I
propose to dispatch cross-type queries directly inside the existing
consistent and distance functions and use the existing subtype
OID argument.

I'm thinking of introducing a general-purpose cross-type dispatch
table in btree_utils_num:

    typedef struct gbt_subtype_info
    {
        Oid         subtype;     /* right-hand Oid, e.g. INT4OID */
        gbt_cmp_fn  lt, le, eq, ge, gt;   /* comparison callbacks */
        gbt_dist_fn dist;                 /* KNN distance callback */
    } gbt_subtype_info;

Each integer opclass defines a static array of these entries:

    static const gbt_subtype_info gbt_int2_subtype_ops[] = {
        {INT4OID,  ... 6 comparison fns ... ,  distance fn},
        {INT8OID,  ... 6 comparison fns ... ,  distance fn},
        {InvalidOid}
    };

I'll replace the existing gbt_num_consistent() function with
gbt_num_consistent_x(), which takes a Datum query value and a
subtype Oid. If the subtype is InvalidOid or matches the indexed
type, the same-type path is used (backward compatible). Otherwise,
the function walks the dispatch table, finds the matching subtype
entry, and invokes the corresponding cross-type comparison or
distance callback.

Other btree_gist opclasses (float4/float8, date, timestamp, ...) and
even range-type GiST opfamilies in core can adopt the same pattern by
defining their own subtype dispatch tables and registering cross-type
operators via ALTER OPERATOR FAMILY.

I don't plan adding cross-type support for every data type as that
would result in a very bulky patch difficult to review, but I think
my proposal establishes the infrastructure so that follow-up work
for additional types is straightforward.

Is there interest in this patch? Should I proceed with polishing my
patch and sending it here for a review? I appreciate any feedback!

Thank you

Simple repro example:

DROP TABLE IF EXISTS camera_feeds_simple CASCADE;

CREATE TABLE camera_feeds_simple (
    id          SERIAL PRIMARY KEY,
    camera_id   int8 NOT NULL
);

CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE INDEX idx_camera_feeds_gist
    ON camera_feeds_simple
    USING GIST (camera_id);

SET enable_seqscan = OFF;
SET enable_bitmapscan = ON;

INSERT INTO camera_feeds_simple (camera_id) VALUES
    (1), (1), (1), (1), (2), (2), (2), (2), (3), (3), (3),
    (4), (4), (4), (4), (4), (4), (4), (4), (5), (5), (6), (6);

ANALYZE camera_feeds_simple;

\echo 'Triggering current behaviour that fails to use the index'

EXPLAIN (ANALYZE, COSTS, BUFFERS, TIMING, SUMMARY)
SELECT *
FROM camera_feeds_simple
WHERE camera_id = 4;

\echo 'Triggering query with manual cast workaround'

EXPLAIN (ANALYZE, COSTS, BUFFERS, TIMING, SUMMARY)
SELECT *
FROM camera_feeds_simple
WHERE camera_id = 4::int8;

RESET enable_seqscan;
RESET enable_bitmapscan;

Repro example with two columns:

DROP TABLE IF EXISTS camera_feeds_multi CASCADE;

CREATE TABLE camera_feeds_multi (
    id          SERIAL PRIMARY KEY,
    camera_id   int8 NOT NULL,
    timerange   TSTZRANGE NOT NULL
);

CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE INDEX idx_camera_feeds_gist
    ON camera_feeds_multi
    USING GIST (camera_id, timerange);

SET enable_seqscan = OFF;
SET enable_bitmapscan = ON;

INSERT INTO camera_feeds_multi (camera_id, timerange) VALUES
    (1, '[2026-04-01 06:00:00+00, 2026-04-01 12:00:00+00)'),
    (1, '[2026-04-01 12:00:00+00, 2026-04-01 18:00:00+00)'),
    (1, '[2026-04-01 18:00:00+00, 2026-04-02 06:00:00+00)'),
    (1, '[2026-04-02 06:00:00+00, 2026-04-02 12:00:00+00)'),

    (2, '[2026-04-01 07:00:00+00, 2026-04-01 09:00:00+00)'),
    (2, '[2026-04-01 08:30:00+00, 2026-04-01 11:00:00+00)'),
    (2, '[2026-04-01 10:00:00+00, 2026-04-01 14:00:00+00)'),
    (2, '[2026-04-01 13:00:00+00, 2026-04-01 17:00:00+00)'),

    (3, '[2026-03-15 00:00:00+00, 2026-03-15 23:59:59+00)'),
    (3, '[2026-03-20 08:00:00+00, 2026-03-20 20:00:00+00)'),
    (3, '[2026-04-01 00:00:00+00, 2026-04-01 23:59:59+00)'),

    (4, '[2026-04-01 00:00:00+00, 2026-04-01 06:00:00+00)'),
    (4, '[2026-04-01 06:00:00+00, 2026-04-01 08:00:00+00)'),
    (4, '[2026-04-01 08:00:00+00, 2026-04-01 12:00:00+00)'),
    (4, '[2026-04-01 12:00:00+00, 2026-04-01 14:00:00+00)'),
    (4, '[2026-04-01 14:00:00+00, 2026-04-01 18:00:00+00)'),
    (4, '[2026-04-01 18:00:00+00, 2026-04-01 22:00:00+00)'),
    (4, '[2026-04-01 22:00:00+00, 2026-04-02 00:00:00+00)'),
    (4, '[2026-04-01 09:00:00+00, 2026-04-01 15:00:00+00)'),

    (5, '[2026-03-30 00:00:00+00, 2026-04-02 00:00:00+00)'),
    (5, '[2026-04-01 10:00:00+00, 2026-04-01 10:30:00+00)'),

    (6, '[2026-04-01 12:00:00+00, 2026-04-01 12:00:00+00)'),
    (6, '[2026-04-01 12:00:00+00, 2026-04-01 13:00:00+00)');

ANALYZE camera_feeds_multi;

\echo 'Triggering current behaviour that fails to use the index'

EXPLAIN (ANALYZE, COSTS, BUFFERS, TIMING, SUMMARY)
SELECT *
FROM camera_feeds_multi
WHERE camera_id = 4
  AND timerange && '[2026-04-01 10:00:00+00, 2026-04-01 16:00:00+00)'::tstzrange
ORDER BY timerange;

\echo 'Triggering query with manual cast workaround'

EXPLAIN (ANALYZE, COSTS, BUFFERS, TIMING, SUMMARY)
SELECT *
FROM camera_feeds_multi
WHERE camera_id = 4::int8
  AND timerange && '[2026-04-01 10:00:00+00, 2026-04-01 16:00:00+00)'::tstzrange
ORDER BY timerange;

RESET enable_seqscan;
RESET enable_bitmapscan;

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Jones 2026-05-02 17:32:30 Re: Fix bug with accessing to temporary tables of other sessions
Previous Message Alexander Korotkov 2026-05-02 16:51:52 Re: Two issues leading to discrepancies in FSM data on the standby server