From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | Uwe Schroeder <uwe(at)oss4u(dot)com> |
Cc: | "Reuven M(dot) Lerner" <reuven(at)lerner(dot)co(dot)il>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Searching for "bare" letters |
Date: | 2011-10-02 09:35:55 |
Message-ID: | Pine.LNX.4.64.1110021333280.26195@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
I don't see the problem - you can have a dictionary, which does all work on
recognizing bare letters and output several versions. Have you seen unaccent
dictionary ?
Oleg
On Sun, 2 Oct 2011, Uwe Schroeder wrote:
>> Hi, everyone. Uwe wrote:
>>> What kind of "client" are the users using? I assume you will have some
>>> kind of user interface. For me this is a typical job for a user
>>> interface. The number of letters with "equivalents" in different
>>> languages are extremely limited, so a simple matching routine in the
>>> user interface should give you a way to issue the proper query.
>>
>> The user interface will be via a Web application. But we need to store
>> the data with the European characters, such as ?, so that we can display
>> them appropriately. So much as I like your suggestion, we need to do
>> the opposite of what you're saying -- namely, take a bare letter, and
>> then search for letters with accents and such on them.
>>
>> I am beginning to think that storing two versions of each name, one bare
>> and the other not, might be the easiest way to go. But hey, I'm open
>> to more suggestions.
>>
>> Reuven
>
>
> That still doesn't hinder you from using a matching algorithm. Here a simple
> example (to my understanding of the problem)
> You have texts stored in the db both containing a n and a ?. Now a client
> enters "n" on the website. What you want to do is look for both variations, so
> "n" translates into "n" or "?".
> There you have it. In the routine that receives the request you have a
> matching method that matches on "n" (or any of the few other characters with
> equivalents) and the routine will issue a query with a "xx like "%n%" or xx
> like "%?%" (personally I would use ilike, since that eliminates the case
> problem).
>
> Since you're referring to a "name", I sure don't know the specifics of the
> problem or data layout, but by what I know I think you can tackle this with a
> rather primitive "match -> translate to" kind of algorithm.
>
> One thing I'd not do: store duplicate versions. There's always a way to deal
> with data the way it is. In my opinion storing different versions of the same
> data just bloats a database in favor of a smarter way to deal with the initial
> data.
>
> Uwe
>
>
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | r d | 2011-10-02 10:41:20 | Updating 9.0.4 --> 9.1.1: How best to ??? |
Previous Message | Uwe Schroeder | 2011-10-02 08:20:10 | Re: Searching for "bare" letters |