Re: FTS parser - missing UUID token type

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Przemysław Sztoch <przemyslaw(at)sztoch(dot)pl>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FTS parser - missing UUID token type
Date: 2022-09-14 14:10:39
Message-ID: 2673581.1663164639@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?UTF-8?Q?Przemys=c5=82aw_Sztoch?= <przemyslaw(at)sztoch(dot)pl> writes:
> I miss UUID, which indexes very strangely, is more and more popular and
> people want to search for it.

Really? UUIDs in running text seem like an extremely uncommon
use-case to me. URLs in running text are common nowadays, which is
why the text search parser has special code for that, but UUIDs?

Adding such a thing isn't cost-free either. Aside from the
probably-substantial development effort, we know from experience
with the URL support that it sometimes misfires and identifies
something as a URL or URL fragment when it really isn't one.
That leads to poorer indexing of the affected text. It seems
likely that adding a UUID token type would be a net negative
for most people, since they'd be subject to that hazard even if
their text contains no true UUIDs.

It's a shame that the text search parser isn't more extensible.
If it were you could imagine having such a feature while making
it optional. I'm not volunteering to fix that though :-(

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marina Polyakova 2022-09-14 14:19:34 Re: ICU for global collation
Previous Message Alvaro Herrera 2022-09-14 13:56:43 Re: Avoid redudant initialization and possible memory leak (src/backend/parser/parse_relation.c)