Re: snowball ASCII stemmer configuration

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: snowball ASCII stemmer configuration
Date: 2020-06-16 13:53:46
Message-ID: 1300297.1592315626@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
> There are two cases where these two columns are not the same:

> hindi english \
> russian english \

> The second one is old; the first one I added using the second one as
> example. But I wonder what the rationale for this is. Maybe for hindi
> one could make some kind of cultural argument, but for russian this
> seems entirely arbitrary.

Perhaps it is, but we have actual Russians who think it's a good idea.
I recall questioning that point some years ago, and Oleg replied that
they'd done that intentionally because (a) technical Russian uses a lot
of English words, and (b) it's easy to tell which is which thanks to
the disjoint letter sets.

Whether the same is true for Hindi, I have no idea.

> Moreover, AFAIK, the following other languages do not use Latin-based
> alphabets:

> arabic arabic \
> greek greek \
> nepali nepali \
> tamil tamil \

Hmm. I think all of those entries are ones that got added by me while
absorbing post-2007 Snowball updates, and I confess that I did not think
about this point. Maybe these should be changed.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message amul sul 2020-06-16 13:55:40 [Patch] ALTER SYSTEM READ ONLY
Previous Message Masahiko Sawada 2020-06-16 13:43:58 Re: Transactions involving multiple postgres foreign servers, take 2