| From: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
|---|---|
| To: | Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Tim van der Linden <tim(at)shisaa(dot)jp> |
| Cc: | Postgres General <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: Multiple word synonyms (maybe?) |
| Date: | 2015-10-20 13:45:22 |
| Message-ID: | 75690271.301221.1445348722562.JavaMail.yahoo@mail.yahoo.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
On Tuesday, October 20, 2015 6:05 AM, Geoff Winkless <pgsqladmin(at)geoff(dot)dj> wrote:
> On 20 October 2015 at 11:35, Tim van der Linden <tim(at)shisaa(dot)jp>wrote:
>> Of course, I can simply go ahead and create my own synonym
>> dictionary with a jargon specific synonym file to feed it. However,
>> most of the synonyms are comprised out of more then a single word.
>
> Does the Thesaurus dictionary not do what you want?
>
> http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html#TEXTSEARCH-THESAURUS
+1
I had a very similar need for legal terms (e.g., "power of
attorney") and the thesaurus fit that need exactly.
I don't know whether you'll run into the other need I had that
required some special handling for full text search with legal
documents: things like dates, case numbers, and statute cites were
not handled well by default. What I did there was to pick those
out with regular expression searches, put them into a
space-separated string, cast that to tsvector, assign a higher
weight to such key elements, and concatenate that tsvector with the
one generated from the standard text parser and dictionaries.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
| From | Date | Subject | |
|---|---|---|---|
| Next Message | pinker | 2015-10-20 14:00:04 | Escaping text / hstore |
| Previous Message | Merlin Moncure | 2015-10-20 13:30:11 | Re: RAID and SSD configuration question |