From: | "Daniel Verite" <daniel(at)manitou-mail(dot)org> |
---|---|
To: | "Jeff Davis" <pgsql(at)j-davis(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: CREATE DATABASE command for non-libc providers |
Date: | 2025-06-13 16:41:45 |
Message-ID: | eaafe5c4-a1eb-4028-92a1-722304875d86@manitou-mail.org |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jeff Davis wrote:
> The main challenge is backwards compatibility. Users of FTS would need
> to recreate all of their tsvectors and indexes dependent on them. It's
> even possible that some users only have tsvectors and don't store the
> original data in the database, which would further complicate matters.
Why would it be that bad?
FTS indexes don't get corrupted that way. You may get different
lexems before and after the upgrade for some documents, and then
what?
The FTS parser had seen user-visible changes in the past, and
regenerating tsvectors because of that were merely a suggestion.
commit 61d66c44f18c73094a50a2ef97d26cc03e171dc0
Author: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Date: Tue Mar 29 17:59:58 2016 +0300
Fix support of digits in email/hostnames.
When tsearch was implemented I did several mistakes in hostname/email
definition rules:
1) allow underscore in hostname what ted by RFC
2) forget to allow leading digits separated by hyphen (like 123-x.com)
in hostname
3) do no allow underscore/hyphen after leading digits in localpart of
email
Artur's patch resolves two last issues, but by the way allows hosts name
like
123_x.com together with 123-x.com. RFC forbids underscore usage in
hostname
but pg allows that since initial tsearch version in core, although only
for non-digits. Patch syncs support digits and nondigits in both hostname
and
email.
Forbidding underscore in hostname may break existsing usage of tsearch
and,
anyhow, it should be done by separate patch.
Author: Artur Zakirov
BUG: #13964
In the release notes:
Fix the default text search parser to allow leading digits in email
and host tokens (Artur Zakirov)
In most cases this will result in few changes in the parsing of
text. But if you have data where such addresses occur frequently, it
may be worth rebuilding dependent tsvector columns and indexes so
that addresses of this form will be found properly by text searches.
commit 2c265adea3129c917296b46a82786d67988ece2c
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Wed Apr 28 02:04:16 2010 +0000
Modify the built-in text search parser to handle URLs more nearly
according
to RFC 3986. In particular, these characters now terminate the path part
of a URL: '"', '<', '>', '\', '^', '`', '{', '|', '}'. The previous
behavior
was inconsistent and depended on whether a "?" was present in the path.
Per gripe from Donald Fraser and spec research by Kevin Grittner.
This is a pre-existing bug, but not back-patching since the risks of
breaking existing applications seem to outweigh the benefits.
https://www.postgresql.org/docs/release/9.0.0/
E.24.3.5.1. Full Text Search
Use more standards-compliant rules for parsing URL tokens (Tom Lane)
Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2025-06-13 16:50:12 | Re: CREATE DATABASE command for non-libc providers |
Previous Message | Taras Kloba | 2025-06-13 16:41:32 | [PATCH] Fix incomplete memory clearing in OAuth authentication |