| From: | Florents Tselai <florents(dot)tselai(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Patch: Add tsmatch JSONPath operator for granular Full Text Search |
| Date: | 2026-02-01 11:02:37 |
| Message-ID: | CA+v5N41Rr-18cjSsc3a6YvGTmnaXmfNiQM3e_gvdw76Yj7y6zA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Mon, Jan 26, 2026 at 7:22 PM Florents Tselai <florents(dot)tselai(at)gmail(dot)com>
wrote:
> Hi,
>
> in real-life I work a lot with json & fts search, here's a feature I've
> always wished I had,
> but never tackle it. Until yesterday that is.
>
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
> tsmatch "performance")');
>
> This patch introduces a tsmatch boolean operator to the JSONPath engine.
> By integrating FTS natively into path expressions,
> this operator allows for high-precision filtering of nested JSONB
> structures—
> solving issues with structural ambiguity and query complexity.
>
> Currently, users must choose between two suboptimal paths for FTS-ing
> nested JSON:
> - Imprecise Global Indexing
> jsonb_to_tsvector aggregates text into a flat vector.
> This ignores JSON boundaries, leading to false positives when the same key
> (e.g., "body")
> appears in different contexts (e.g., a "Product Description" vs. a
> "Customer Review").
>
> - Complex SQL Workarounds
> Achieving 100% precision requires unnesting the document via
> jsonb_array_elements and LATERAL joins.
> This leads to verbose SQL and high memory overhead from generating
> intermediate heap tuples.
>
> One of the most significant advantages of tsmatch is its ability to
> participate in multi-condition predicates
> within the same JSON object - something jsonb_to_tsvector cannot do.
>
> SELECT jsonb_path_query(doc, '$.comments[*] ? (@.user == "Alice" && @.body
> tsmatch "performance")');
>
> In a flat vector, the association between "Alice" and "performance" is
> lost.
> tsmatch preserves this link by evaluating the FTS predicate in-place
> during path traversal.
>
> While the SQL/JSON standard (ISO/IEC 9075-2) does not explicitly define an
> FTS operator,
> tsmatch is architecturally modeled after the standard-defined like_regex.
>
> The implementation follows the like_regex precedent:
> it is a non-indexable predicate that relies on GIN path-matching for
> pruning and heap re-checks for precision.
> Caching is scoped to the JsonPathExecContext,
> ensuring 'compile-once' efficiency per execution without violating the
> stability requirements of prepared statements.
>
> This initial implementation uses plainto_tsquery.
> However, the grammar is designed to support a "mode" flag (similar to
> like_regex flags)
> in future iterations to toggle between to_tsquery, websearch_to_tsquery,
> and phraseto_tsquery.
>
Here's a v2, that implements the tsqparser clause
So this should now work too
select jsonb_path_query_array('["fast car", "slow car", "fast and
furious"]', '$[*] ? (@ tsmatch "fast car" tsqparser "w")
| Attachment | Content-Type | Size |
|---|---|---|
| v2-0001-Add-tsmatch-JSONPath-operator-for-granular-Full-T.patch | application/octet-stream | 31.2 KB |
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Alexandre Felipe | 2026-02-01 10:02:06 | New access method for b-tree. |