Re: Charset/collate support and function parameters

From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Charset/collate support and function parameters
Date: 2004-10-30 20:51:01
Message-ID: Pine.LNX.4.44.0410302044190.2015-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 30 Oct 2004, Tom Lane wrote:

> > Are you worried about performance or is it the smaller change that you
> > want?
>
> I'm worried about the fact that instead of, say, one length(text)
> function, we would now have to have a different one for every
> characterset/collation.

This is not about how the parameter information is stored, but let's
discuss that anyway. It's important issues.

I was hoping that we could implement functions where one didn't have to
specify the charset and collation (but could if we want to).

For some functions one really want different ones depending on the
charset. For example the length function, then we will need to calculate
the length differently for each charset. We can never have one length
function that works for every possible charset. We could have one pg
function that do N different things inside depending on the charset, but
that's not really a simplification.

For functions where one have not specified the charset of an argument then
we need to be able to pass on that type information to where ever we use
that argument. Variables already have a type and if we have a (pseudo
code) function like

foo (a varchar) returns int
{
select length(a);
}

and call it with

foo ('foo' charset latin1)

then we need to make sure that variable a inside the function body of foo
get the type from the caller and then the function call to length(a) will
work out since it would select the length function for latin1. I think it
should work but an implementation is the only way to know.

Every string do in the end need to know what charset and what collation it
is in. Otherwise it can not be used for anything, not even to compare it
with another string.

I could even imagine to have different functions for each
charset/collation. It's not that many functions built in that are affected
and not all of them need to work with every collation. The user just need
to call them with the correct one. I don't expect any functions like

foo (a varchar collation sv_SE,
b varchar collation en_US)

or any other combination of a and b. If any then a and be will be the same
type. So there would not be arbitrary many combinations (but still a lot).

The alternative is storing the charset and collation inside each string.
That seems like a too big price to pay, it belong in the type.

> Not to mention one for every possible N in varchar(N).

This doesn't matter since one can always implement functions to take
varchar arguments without any limit and then any shorter string can be
implictly casted up to that type. Or one can treat the length exactly like
the charset above.

Of course you do not want one length function for each length.

--
/Dennis Björklund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-10-30 21:24:10 Re: Charset/collate support and function parameters
Previous Message Tom Lane 2004-10-30 20:45:22 Re: 8.0b4: COMMIT outside of a transaction echoes ROLLBACK