Patch: Add tsmatch JSONPath operator for granular Full Text Search

From: Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Patch: Add tsmatch JSONPath operator for granular Full Text Search
Date: 2026-01-26 17:22:19
Message-ID: CA+v5N42aGz5sCa3HoURUK82b-PN6N6LtUj45vAcL7KEBd5i5fg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

in real-life I work a lot with json & fts search, here's a feature I've
always wished I had,
but never tackle it. Until yesterday that is.

SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
tsmatch "performance")');

This patch introduces a tsmatch boolean operator to the JSONPath engine.
By integrating FTS natively into path expressions,
this operator allows for high-precision filtering of nested JSONB
structures—
solving issues with structural ambiguity and query complexity.

Currently, users must choose between two suboptimal paths for FTS-ing
nested JSON:
- Imprecise Global Indexing
jsonb_to_tsvector aggregates text into a flat vector.
This ignores JSON boundaries, leading to false positives when the same key
(e.g., "body")
appears in different contexts (e.g., a "Product Description" vs. a
"Customer Review").

- Complex SQL Workarounds
Achieving 100% precision requires unnesting the document via
jsonb_array_elements and LATERAL joins.
This leads to verbose SQL and high memory overhead from generating
intermediate heap tuples.

One of the most significant advantages of tsmatch is its ability to
participate in multi-condition predicates
within the same JSON object - something jsonb_to_tsvector cannot do.

SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
tsmatch "performance")');

In a flat vector, the association between "Alice" and "performance" is lost.
tsmatch preserves this link by evaluating the FTS predicate in-place during
path traversal.

While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an
FTS operator,
tsmatch is architecturally modeled after the standard-defined like_regex.

The implementation follows the like_regex precedent:
it is a non-indexable predicate that relies on GIN path-matching for
pruning and heap re-checks for precision.
Caching is scoped to the JsonPathExecContext,
ensuring 'compile-once' efficiency per execution without violating the
stability requirements of prepared statements.

This initial implementation uses plainto_tsquery.
However, the grammar is designed to support a "mode" flag (similar to
like_regex flags)
in future iterations to toggle between to_tsquery, websearch_to_tsquery,
and phraseto_tsquery.

Attachment Content-Type Size
v1-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch application/octet-stream 22.2 KB
tsmatch_bench.out application/octet-stream 6.7 KB
tsmatch_bench.sql application/octet-stream 2.4 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2026-01-26 17:22:33 Re: pg_waldump: support decoding of WAL inside tarfile
Previous Message Andres Freund 2026-01-26 17:16:08 Re: unnecessary executor overheads around seqscans