Re: Empty string in lexeme for tsvector

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jean-Christophe Arnu <jcarnu(at)gmail(dot)com>
Cc: Artur Zakirov <zaartur(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Empty string in lexeme for tsvector
Date: 2021-09-29 19:36:11
Message-ID: 2997142.1632944171@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jean-Christophe Arnu <jcarnu(at)gmail(dot)com> writes:
> [ empty_string_in_tsvector_v4.patch ]

I looked through this patch a bit. I don't agree with adding
these new error conditions to tsvector_setweight_by_filter and
tsvector_delete_arr. Those don't prevent bad lexemes from being
added to tsvectors, so AFAICS they can have no effect other than
breaking existing applications. In fact, tsvector_delete_arr is
one thing you could use to fix existing bad tsvectors, so making
it throw an error seems actually counterproductive.

(By the same token, I think there's a good argument for
tsvector_delete_arr to just ignore nulls, not throw an error.
That's a somewhat orthogonal issue, though.)

What I'm wondering about more than that is whether array_to_tsvector
is the only place that can inject an empty lexeme ... don't we have
anything else that can add lexemes without going through the parser?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2021-09-29 19:53:51 Re: [BUG] failed assertion in EnsurePortalSnapshotExists()
Previous Message Ranier Vilela 2021-09-29 19:16:44 Re: jsonb crash