Quick Links

Re: Why can I not get lexemes for Hebrew but can get them for Armenian?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Sam Saffron <sam(dot)saffron(at)gmail(dot)com>
Cc:	PGSQL Mailing List <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Why can I not get lexemes for Hebrew but can get them for Armenian?
Date:	2019-02-27 15:51:05
Message-ID:	15145.1551282665@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

Sam Saffron <sam(dot)saffron(at)gmail(dot)com> writes:
> So something is clearly different about the way the tokenisation is
> defined in PG. My question is, how do I figure out what is different
> and how do I make my mac install of PG work like the Linux one?

I'm not sure you can :-(. This devolves to what the libc locale
functions (isalpha(3) and friends) do, and unfortunately the UTF8
locales on OS X are impossibly lame. They tend not to provide
useful character classifications for high Unicode code points.
They don't sort very well either, though that's not your problem here.

Depending on what characters you actually need to work with,
you might have better luck using one of the ISO8859 character set
locales. Though if you actually need both Hebrew and Armenian
in the same DB, that suggestion is a nonstarter.

regards, tom lane

In response to

Why can I not get lexemes for Hebrew but can get them for Armenian? at 2019-02-27 10:11:37 from Sam Saffron

Browse pgsql-general by date

	From	Date	Subject
Next Message	Filip Rembiałkowski	2019-02-27 16:00:22	Re: 9.0 standby - could not open file global/XXXXX
Previous Message	Tom Lane	2019-02-27 15:42:12	Re: why not using a mountpoint as PGDATA?