Re: How to add locale support for each column?

From: Mahmoud Taghizadeh <m_taghi(at)yahoo(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to add locale support for each column?
Date: 2004-09-25 10:04:25
Message-ID: 20040925100425.20198.qmail@web50708.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

and whats the final result?, does this method is suitable ?
Is it possible for postgresql leaders to add this function in postgresql source code?


glibc does not support jalali and herbew and ... calendars, so postgresql does not support them too. we are planning to write some simple functions to convert different calanders.

whats the group idea?

Greg Stark <gsstark(at)mit(dot)edu> wrote:
Tom Lane writes:

> Greg Stark writes:
> > Peter Eisentraut
writes:
> >> 2) switching the locale at run time is too expensive when using the system
> >> library.
>
> > Fwiw I did some experiments with this and found it wasn't true.
>
> Really?

We're following two different methodologies so the results aren't comparable.
I exposed strxfrm to postgres and then did a sort on strxfrm(col). The
resulting query times were slower than sorting on lower(col) by negligible
amounts.

I have the original code I wrote and Joe Conway's reimplementation of it using
setjmp/longjmp to protect against errors. However the list archives appear to
have been down that month so I've attached Joe Conway's implementation below.

> These are on machines of widely varying horsepower, so the absolute
> numbers shouldn't be compared across rows, but the general story holds:
> setlocale should be considered to be at least an order of magnitude
> slower than strcoll, and on non-glibc machines it can be a whole lot
> worse than that.

I don't see how this is relevant though. One way or another postgres is going
to have to sort strings in varying locales chosen at run-time. Comparing
against strcoll's execution time without changing changing locales is a straw
man. It's like comparing your tcp/ip bandwidth with the loopback interface's
bandwidth.

I see no reason to think Postgres's implementation of looking up xfrm rules
for the specified locale will be any faster than the OS's. We know some OS's
suck but some certainly don't.

Perhaps glibc's locale handling functions ought to be available as a separate
library users of those OS's could install -- if it isn't already.

> I don't think we can take that attitude when the cost penalty involved
> can be a couple of orders of magnitude.

Aside from the above complaint there's another problem with your methodology.
An order of magnitude change in the cost for strcoll isn't really relevant
unless the cost for strcoll is significant to begin with. I suspect strcoll
costs are currently dwarfed by the palloc costs to evaluate the expression
already.

Here's the implementation in postgres from Joe Conway btw:

> ATTACHMENT part 2 message/rfc822
Date: Sat, 23 Aug 2003 09:48:49 -0700
From: Joe Conway
CC: Greg Stark , Stephan Szabo ,
Tom Lane , pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Collation rules and multi-lingual databases

Joe Conway wrote:
> What about something like this?

Oops! Forgot to restrore error handling. See below:

Joe

> 8<--------------------------------
>
> #include
> #include
>
> #include "postgres.h"
> #include "fmgr.h"
> #include "tcop/tcopprot.h"
> #include "utils/builtins.h"
>
> #define GET_STR(textp) \
> DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp)))
> #define GET_BYTEA(str_) \
> DatumGetTextP(DirectFunctionCall1(byteain, CStringGetDatum(str_)))
> #define MAX_BYTEA_LEN 0x3fffffff
>
> /*
> * pg_strxfrm - Function to convert string similar to the strxfrm C
> * function using a specified locale.
> */
> extern Datum pg_strxfrm(PG_FUNCTION_ARGS);
> PG_FUNCTION_INFO_V1(pg_strxfrm);
>
> Datum
> pg_strxfrm(PG_FUNCTION_ARGS)
> {
> char *str = GET_STR(PG_GETARG_TEXT_P(0));
> size_t str_len = strlen(str);
> char *localestr = GET_STR(PG_GETARG_TEXT_P(1));
> size_t approx_trans_len = 4 + (str_len * 3);
> char *trans = (char *) palloc(approx_trans_len + 1);
> size_t actual_trans_len;
> char *oldlocale;
> char *newlocale;
> sigjmp_buf save_restart;
>
> if (approx_trans_len > MAX_BYTEA_LEN)
> elog(ERROR, "source string too long to transform");
>
> oldlocale = setlocale(LC_COLLATE, NULL);
> if (!oldlocale)
> elog(ERROR, "setlocale failed to return a locale");
>
> /* catch elog while locale is set other than the default */
> memcpy(&save_restart, &Warn_restart, sizeof(save_restart));
> if (sigsetjmp(Warn_restart, 1) != 0)
> {
> memcpy(&Warn_restart, &save_restart, sizeof(Warn_restart));
> newlocale = setlocale(LC_COLLATE, oldlocale);
> if (!newlocale)
> elog(PANIC, "setlocale failed to reset locale: %s", localestr);
> siglongjmp(Warn_restart, 1);
> }
>
> newlocale = setlocale(LC_COLLATE, localestr);
> if (!newlocale)
> elog(ERROR, "setlocale failed to set a locale: %s", localestr);
>
> actual_trans_len = strxfrm(trans, str, approx_trans_len + 1);
>
> /* if the buffer was not large enough, resize it and try again */
> if (actual_trans_len >= approx_trans_len)
> {
> approx_trans_len = actual_trans_len + 1;
> if (approx_trans_len > MAX_BYTEA_LEN)
> elog(ERROR, "source string too long to transform");
>
> trans = (char *) repalloc(trans, approx_trans_len + 1);
> actual_trans_len = strxfrm(trans, str, approx_trans_len + 1);
>
> /* if the buffer still not large enough, punt */
> if (actual_trans_len >= approx_trans_len)
> elog(ERROR, "strxfrm failed, buffer insufficient");
> }
>
> newlocale = setlocale(LC_COLLATE, oldlocale);
> if (!newlocale)
> elog(PANIC, "setlocale failed to reset locale: %s", localestr);

/* restore normal error handling */
memcpy(&Warn_restart, &save_restart, sizeof(Warn_restart));

>
> PG_RETURN_BYTEA_P(GET_BYTEA(trans));
> }
>
> 8<--------------------------------
>

--
greg

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

With Regards,
--taghi

---------------------------------
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2004-09-25 10:14:53 Re: 'TID index'
Previous Message Tom Lane 2004-09-25 03:25:04 Re: 7.4.5 losing committed transactions

Browse pgsql-patches by date

  From Date Subject
Next Message Magnus Hagander 2004-09-25 12:26:13 Re: plpython win32
Previous Message Tom Lane 2004-09-24 22:42:08 Re: plpython win32