Re: [PATCHES] char/varchar locale support

From: "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu>
To: phd2(at)earthling(dot)net
Cc: Postgres Hackers List <hackers(at)postgresql(dot)org>, oleg(at)sai(dot)msu(dot)su
Subject: Re: [PATCHES] char/varchar locale support
Date: 1998-05-15 13:18:13
Message-ID: 355C4095.88DD1D94@alumni.caltech.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(moved to hackers list)

> I am working on extending locale support for char/varchar types.
> Q1. I touched ...src/include/utils/builtins.h to insert the following
> macros:
> -----
> #ifdef USE_LOCALE
> #define pgstrcmp(s1,s2,l) strcoll(s1,s2)
> #else
> #define pgstrcmp(s1,s2,l) strncmp(s1,s2,l)
> #endif
> -----
> Is it right place? I think so, am I wrong?

Probably the right place. Probably the wrong code; see below...

> Q2. Bartunov said me I should read varlena.c. I read it and found
> that for every strcoll() for both strings there are calls to allocate
> memory (to make them null-terminated). Oleg said I need the same for
> varchar.
> Do I really need to allocate space for varchar? What about char? Is it
> 0-terminated already?

No, neither bpchar nor varchar are guaranteed to be null terminated.
Yes, you will need to allocate (palloc()) local memory for this. Your
pgstrcmp() macros are not equivalent, since strncmp() will stop the
comparison at the specified limit (l) where strcoll() requires a null
terminated string.

If you look in varlena.c you will find several places with
#if USE_LOCALE
...
#else
...
#endif

Those blocks will need to be replicated in varchar.c for both bpchar and
varchar support routines.

The first example I looked at in varlena.c seems to have trouble in that
the code looks a bit troublesome :( In the code snippet below (from
text_lt), both input strings are replicated and copied to the same
output length, even though the input lengths can be different. Looks
wrong to me:

memcpy(a1p, VARDATA(arg1), len);
*(a1p + len) = '\0';
memcpy(a2p, VARDATA(arg2), len);
*(a2p + len) = '\0';

Instead of "len" in each expression it should probably be
len1 = VARSIZE(arg1)-VARHDRSZ
len2 = VARSIZE(arg2)-VARHDRSZ

Another possibility for implementation is to write a string comparison
routine (e.g. varlena_cmp()) which takes two arguments and returns -1,
0, or 1 for less than, equals, and greater than. All of the comparison
routines can call that one (which would have the #if USE_LOCALE), rather
than having USE_LOCALE spread through each comparison routine.

- Tom

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jose' Soares Da Silva 1998-05-15 13:34:04 Re: [INTERFACES] Group/Order by not in target - Was [NEW ODBC DRIVER]
Previous Message Michal Mosiewicz 1998-05-15 02:11:12 Re: [HACKERS] mmap and MAP_ANON