From: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> |
---|---|
To: | pgman(at)candle(dot)pha(dot)pa(dot)us |
Cc: | phede-ml(at)islande(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Unicode combining characters |
Date: | 2001-10-02 01:14:16 |
Message-ID: | 20011002101416E.t-ishii@sra.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> Can someone give me TODO items for this discussion?
What about:
Improve Unicode combined character handling
--
Tatsuo Ishii
> > > So, this shows two problems :
> > >
> > > - length() on the server side doesn't handle correctly Unicode [I have
> > > the same result with char_length()], and returns the number of chars
> > > (as it is however advertised to do), rather the length of the
> > > string.
> >
> > This is a known limitation.
> >
> > > - the psql frontend makes the same mistake.
> > >
> > > I am using version 7.1.3 (debian sid), so it may have been corrected
> > > in the meantime (in this case, I apologise, but I have only recently
> > > started again to use PostgreSQL and I haven't followed -hackers long
> > > enough).
> > >
> > >
> > > => I think fixing psql shouldn't be too complicated, as the glibc
> > > should be providing the locale, and return the right values (is this
> > > the case ? and what happens for combined latin + chinese characters
> > > for example ? I'll have to try that later). If it's not fixed already,
> > > do you want me to look at this ? [it will take some time, as I haven't
> > > set up any development environment for postgres yet, and I'm away for
> > > one week from thursday].
> >
> > Sounds great.
> >
> > > I was wondering if some people have already thought about this, or
> > > already done something, or if some of you are interested in this. If
> > > nobody does anything, I'll do something eventually, probably before
> > > Christmas (I don't have much time for this, and I don't need the
> > > functionality right now), but if there is an interest, I could team
> > > with others and develop it faster :)
> >
> > I'm very interested in your point. I will start studying [1][2] after
> > the beta freeze.
> >
> > > Anyway, I'm open to suggestions :
> > >
> > > - implement it in C, in the core,
> > >
> > > - implement it in C, as contributed custom functions,
> >
> > This may be a good starting point.
> >
> > > I can't really accept a solution which would rely on the underlaying
> > > libc, as it may not provide the necessary locales (or maybe, then,
> >
> > I totally agree here.
> >
> > > The main functions I foresee are :
> > >
> > > - provide a normalisation function to all 4 forms,
> > >
> > > - provide a collation_key(text, language) function, as the calculation
> > > of the key may be expensive, some may want to index on the result (I
> > > would :) ),
> > >
> > > - provide a collation algorithm, using the two previous facilities,
> > > which can do primary to tertiary collation (cf TR#10 for a detailed
> > > explanation).
> > >
> > > I haven't looked at PostgreSQL code yet (shame !), so I may be
> > > completely off-track, in which case I'll retract myself and won't
> > > bother you again (on that subject, that is ;) )...
> > >
> > > Comments ?
> > --
> > Tatsuo Ishii
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
> >
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2001-10-02 01:23:18 | Re: Unicode combining characters |
Previous Message | Thomas Lockhart | 2001-10-02 01:14:00 | Re: CVS changes |