Re: unaccent extension missing some accents

From: J Smith <dark(dot)panda+lists(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: unaccent extension missing some accents
Date: 2011-11-07 03:25:33
Message-ID: CADFUPgcnqNMrbNKxWX1gBRBG2TfOXC6LCoJnZkF84f_Pi2L2WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2011-11-06, at 7:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> swscanf doesn't seem like an acceptable approach: it's a function that
> is relied on nowhere else in PG, so it adds new portability risks of its
> own. It doesn't exist on some platforms that we support (like the one
> I'm typing this message on) and there's no real good reason to assume
> that it's not broken in its own ways on others.
>
> If you really want to pursue this, I'd suggest parsing the line
> manually, perhaps via strchr searches for \t and \n. It likely wouldn't
> be very many more lines than what you've got here.
>
> However, the bigger picture is that OS X's UTF8 locales are broken
> through-and-through, and most of their other problems are not feasible
> to work around. So basically you can't use them for anything
> interesting, and it's not clear that it's worth putting any time into
> solving individual problems. In the particular case here, the issue
> presumably is that sscanf is relying on isspace() ... but we rely on
> isspace() directly, in quite a lot of places, so how much is it going
> to fix to dodge it right here?
>
> regards, tom lane

There are some fixes for isspace and friend that I've seen python
using so perhaps in those cases a similar fix could be applied. For
instance, maybe something like the code around line 674 here:

http://svn.python.org/view/python/trunk/Include/pyport.h?revision=81029&view=markup

Perhaps that would be suitable on OSX at least in the case of isspace
et al. As far as I can tell scanf doesn't seem to use isspace on my
system so that would only be a partial fix for this an whatever other
situations isspace is used in. (on a mobile now so I can't check a the
moment.)

This isn't really a huge deal for me but I'll try to get a chance to
write up a little parser anyways just for kicks.

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2011-11-07 04:55:48 Re: Measuring relation free space
Previous Message Jeff Davis 2011-11-07 02:28:20 btree gist known problems