| From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
|---|---|
| To: | "pepone(dot)onrez" <pepone(dot)onrez(at)gmail(dot)com> |
| Cc: | PgSQL General ML <pgsql-general(at)postgresql(dot)org> |
| Subject: | Re: Initial ugly reverse-translator |
| Date: | 2009-01-16 05:18:05 |
| Message-ID: | Pine.LNX.4.64.0901160816450.9554@sn.sai.msu.ru |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general |
Hi,
ltree and pg_trgm with UTF8 support are available from CVS HEAD, see
See http://archives.postgresql.org/pgsql-committers/2008-06/msg00356.php
http://archives.postgresql.org/pgsql-committers/2008-11/msg00139.php
Oleg
On Fri, 16 Jan 2009, pepone.onrez wrote:
> On Sat, Apr 19, 2008 at 6:10 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>> On Sat, 19 Apr 2008, Tom Lane wrote:
>>
>>> Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
>>>>
>>>> Tom Lane wrote:
>>>>>
>>>>> I don't really see the problem. I assume from your reference to pg_trgm
>>>>> that you're using trigram similarity as the prefilter for potential
>>>>> matches
>>>
>>>> It turns out that's no good anyway, as it appears to ignore characters
>>>> outside the ASCII range. Rather less than useful for searching a
>>>> database of translated strings ;-)
>>>
>>> A quick look at the pg_trgm code suggests that it is only prepared to
>>> deal with single-byte encodings; if you're working in UTF8, which I
>>> suppose you'd have to be, it's dead in the water :-(. Perhaps fixing
>>> that should be on the TODO list.
>>
>> as well as ltree. they are in our todo list:
>> http://www.sai.msu.su/~megera/wiki/TODO
>>
>
> Hi Oleg
>
> In your TODO list says that UTF8 was added to ltree, is this code
> currently available for download?
>
> Regards,
> JosЪЪ
>>>
>>> But in any case maybe the full-text-search stuff would be more useful
>>> as a prefilter? Although honestly, for the speed we need here, I'm
>>> not sure a prefilter is needed at all. Full text might be useful
>>> if a LIKE-based match fails, though.
>>>
>>>>> (And besides, speed doesn't seem like the be-all and end-all here.)
>>>
>>>> True. It's not so much the speed as the fragility when faced with small
>>>> changes to formatting. In addition to whitespace, some clients mangle
>>>> punctuation with features like automatic "curly"-quoting.
>>>
>>> Yeah. I was wondering whether encoding differences wouldn't be a huge
>>> problem in practice, as well.
>>>
>>> regards, tom lane
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Davis | 2009-01-16 06:04:29 | Re: Query sometimes takes down server |
| Previous Message | Dhaval Shah | 2009-01-16 02:18:12 | Question regarding Postgres + OpenSSL + FIPs |