Re: Charset/collate support and function parameters

From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Charset/collate support and function parameters
Date: 2004-10-31 09:16:49
Message-ID: Pine.LNX.4.44.0410310931270.2015-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 31 Oct 2004, Tatsuo Ishii wrote:

> I don't understand your point. Today we already use one length()
> function for any charsets as Tom has already pointed out.

We have one length function that inside do different things depending on
the charset. If you want to add a charset and implement the length
function for that charset, how do you do that?

The length of a utf-8 string is not calculated the same way as the length
of a latin1 string. Each charset (encoding) have its own way of
calculating the length.

And by the way, today our databases just work with one charset at all and
what length do is decided by a global variable. The difference we talk
about here is the one between

length(latin1) ...
length(utf-8) ...
length(ascii) ...

and

length(x)
{
if charset(x) == latin1 then
,,,
else if charset(x) = utf-8 then
,,,
}

> The question in your approach is how you could handle the coercibility
> property. It's a transient and on memory property thus will not fit
> into the function declaration. No?

No, it's not part of the function signature. Coercibility is a way to
decide what collation to use. Depending on where the value comes from it
can have different coercibility and when one do operations that involves
different collations the coercibility decide how ambiguities are resolved
(which value will be coerced).

If one would want function signatures with charsets in them and where the
charset information is stored, it doesn't have to be opposit of each
other.

I've currently been thinking that one can avoid storing the charset in the
value by handling types like that. I even though that there was no way
that anyone in the pg project would ever accept to enlarge the string
values, obviously a wrong assumption :-)

Even when one do store the charset in the value one might want to have
function overloading to depend on the charset of the string (when
specified).

That's the same opinion that if I declare a function

foo (x varchar(5))
begin
...
end

then I expect to get strings that are max 5 chars long. Why do we allow
the (5) if it's just droped? If I define a column as varchar(5) then the
column values are relly max 5 chars long, but it does not work for
functions like that.

Let us simply agree that we do store the charset/collation/... in the
(memory) values. On disk we don't want that since the column type do
decide it totally, do we agree on that?

--
/Dennis Björklund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2004-10-31 09:47:52 Re: Charset/collate support and function parameters
Previous Message Tatsuo Ishii 2004-10-31 08:32:19 Re: Charset/collate support and function parameters