json(b)_to_tsvector with numeric values

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: json(b)_to_tsvector with numeric values
Date: 2018-04-01 15:10:43
Message-ID: CA+q6zcXJQbS1b4kJ_HeAOoOc=unfnOrUEL=KGgE32QKDww7d8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

We've just noticed, that current implementation of `json(b)_to_tsvector` can be
confusing sometimes, if the target document contains numeric values.
In this case
we just drop them, and only string values will contribute to the result:

select to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::jsonb);
to_tsvector
-----------------
'fat':2 'rat':3
(1 row)

The result would be less surprising if all values, that can be converted to
string representation (so, strings and numeric values, nothing to do for null &
boolean), will take part in it:

select to_tsvector('english', '{"a": "The Fat Rats", "b": 123}'::jsonb);
to_tsvector
-------------------------
'123':5 'fat':2 'rat':3
(1 row)

Attached patch contains small fix that's necessary to get the described
behavior. This patch doesn't touch `ts_headline` though, because following the
same approach it would require changing the type of element in the resulting
json(b).

Any opinions about this suggestion? Can it be considered as a bug fix and
included into this release?

Attachment Content-Type Size
jsonb_to_tsvector_numeric_v1.patch application/octet-stream 5.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Дмитрий Воронин 2018-04-01 15:11:04 Re: Diagonal storage model
Previous Message David Rowley 2018-04-01 14:54:34 Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)