Re: fts, compond words?

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Mike Rylander <mrylander(at)gmail(dot)com>
Cc: POSTGRESQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: fts, compond words?
Date: 2005-12-08 10:33:11
Message-ID: 43980BE7.9000601@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> hrm... that is a problem. Though, I think that's a case of how the
> compiled expression is built from user input. Unless I'm mistaken
>
> a + ( foo1 | foo2 )
>
> is exactly equal to
>
> (a + foo1) | (a + foo2)
>
>
> Ahhh... but then there is the more complex example of
>
> a + foonish + bar
>
> becoming
>
> a + (foo1 | foo2) + bar
>
> .... but I guess that could be
>
> (a + foo1 + bar) | (a + foo2 + bar)

That a simple case, what about languages as norwegian or german? They has
compound words and ispell dictionary can split them to lexemes. But, usialy
there is more than one variant of separation:

forbruksvaremerkelov
forbruk vare merke lov
forbruk vare merkelov
forbruk varemerke lov
forbruk varemerkelov
forbruksvare merke lov
forbruksvare merkelov
(notice: I don't know translation, just an example. When we working on compound
word support we found word which has 24 variant of separation!!)

So, query 'a + forbruksvaremerkelov' will be awful:

a + ( (forbruk & vare & merke & lov) | (forbruk & vare & merkelov) | ... )

Of course, that is examle just from mind, but solution of phrase search should
work reasonably with such corner cases.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Teodor Sigaev 2005-12-08 11:00:55 Re: TSearch2 / Get all unique lexems
Previous Message Gábor Farkas 2005-12-08 10:28:02 is it possible to delete the psql log while psql is running?