Re: Portable interfaces ...

From: "Preston A(dot) Elder" <prez(at)neuromancy(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Portable interfaces ...
Date: 2004-03-27 18:31:37
Message-ID: 1080412297.23111.147.camel@temple.srealm.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-interfaces

On Fri, 2004-03-26 at 05:22, Peter Eisentraut wrote:
> I don't understand what the API has to do with Unicode. Either your
> strings are in Unicode or they are not. This is set by your
> application.

OK, lets say I'm writing a library that is not supposed to know anything
at all about the database structure. I'm just writing something to give
the client-programmer a generic interface to many different storage
types (postgres, berkelydb, an XML file, etc). In this instance, they
may give me a hint as to whether to allow standard or wide strings.

I need to:
1) verify that the database will allow wide (multi-byte) strings if they
have chosen to.
2) find out the encoding style of that db/table so I know how to encode
it into an multi-byte string (since I don't think the libpq API accepts
wchar_t strings, but I think it accepts multi-byte strings).
3) Also know the encoding style so that I can actually encode my wchar_t
strings into multi-byte strings (and vice versa) - since this is very
locale dependant.

As I said, unicode in postgres seems to be an afterthought - this is
also evidenced that encoding can only be specified at the database
level, when most databases I've seen allow it down to the column level
(with inheritance from table then database).

I found out the encoding is available with pg_result->content_encoding.
However
1) The definition of pg_result is not part of the 'standard' API (its in
the 'private' libpq headers). So again, I cannot rely on systems having
it defined.
2) It is an integer, which maps more to an internal encoding type than
to anything I can use with a standard multibyte<->wchar_t function.

> > For now, I am (grudgingly) copying the OID codes, and going to use
> > system-based multibyte/unicode functions (and hope to heck that they
> > are compatible with postgres, but I get the feeling they are not
> > fully compatible after browsing pg_wchar.h)
> They should be.

I should also not have to 'hope it works' for standard (non-user
defined) types. Nor should I have to execute a select to figure out the
oid type for a standard type - which incurs not just the penalty of
having to do a select and waiting for SQL to get back to me, but also a
string comparison of the results (often repeatedly - strcmp(result,
"bool")==0, strcmp(result, "int4")==0, ...) to check what it is. I
realise I could just do this on startup, and build my own table, but its
still a CPU/wait time penalty. And copying constants out of elsewhere
when the constants should be provided is also rather hackish.

> > is, right now, I have no verification on things like string lengths
> > because theres no real way to find out the maximum length of a field
> > (encoding-specific - ie. for UTF-8, a length of 20 means 20 chars,
> > for UTF-16, a length of 20 means 20 wchar_t's (40 bytes)).
> The libpq API gives you a way to determine the binary length of a
> returned datum. You need to allocate that dynamically.
I'm not talking about allocation here. I'm talking about string
length. Or more correctly, maximum string length. I want to find out
from the database the maximum length of a string I can pass - even if
its in bytes (in which case I divide it by 2 for 2-byte encoding systems
... kindof). Which is very different to how many bytes it uses up in
the database's storage.

> The plans are more or less that if you don't like it, try using a
> different one, such as ODBC, or if a different programming language
> than C.
Which is the wrong attitude for a database product trying to compete in
what is turning into a more and more crowded field. I'm not using ODBC
for one simple reason - I don't want the users of my library to have to
configure an ODBC interface to their database, etc. Nor do I want them
to have to install anything extra apart from postgres that they may have
to install for ODBC support. I'm trying to keep my library as native as
possible with database interfaces, which is why I'm not even using
libpq++.

For now, I'm going to have to hack my way around these issues, with a
series of ugly hacks - but there is no way I could ever in good
concience recommend any database application developer who is writing
code in C or C++ use postgres as their back end database without serious
API changes that will actually provide decent information on standard
types and column/table/database configuration. Without having to
manually do some selects, of which are not documented anywhere.

For example, there is nowhere in the docs that tells me "To get the OID
of column 'y' in table 'x', do:
SELECT atttypid FROM pg_attribute
WHERE attrelid = (SELECT relfilenode FROM pg_class
WHERE relname = 'x')
AND attname = 'y'
". That would be an extremely useful statement to have in the docs.
Then even if the API did not support such blatantly obvious things like
this, I would at least know how to hack it without having to research
what I imagine would be commonly needed functionality in any reasonably
complex application or library where the application/library is not
supposed to know, or rely on the underlying data types.

--
PreZ
Founder
The Neuromancy Society
http://www.neuromancy.net

In response to

Responses

Browse pgsql-interfaces by date

  From Date Subject
Next Message Kris Jurka 2004-03-28 05:13:15 Re: Portable interfaces ...
Previous Message Peter Eisentraut 2004-03-26 21:46:50 Re: PostgreSQL-Perl Interface installation