Selecting a constant question: A summary

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Cc: "Larry McGhaw" <lmcghaw(at)connx(dot)com>
Subject: Selecting a constant question: A summary
Date: 2007-06-12 20:41:00
Message-ID: D425483C2C5C9F49B5B7A41F894415470100072D@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

First a comment:

At CONNX Solutions Inc., we believe sincerely that we should do whatever
is necessary to make our customers prosper. This means creation of
excellent tools and being responsive to customer needs. Secondly, we
believe that we should treat the customers the way that we want to be
treated.

I think that the PostgreSQL group has managed the first objective, but
not the second. Of course, that is only an opinion, but I think that
success hinges on both factors. Our objective in this issue has also
been to improve PostgreSQL so that it can become more useful to the end
users and not to denigrate the work of the engineers that have toiled on
it. I will also admit that frustration has caused our tone to become
sharp at times. This is clearly a mistake on our part and for this, I
apologize.

Next, the problem:

According to SQL/CLI and ODBC 3.5, we should bind the length of a
character column.

Here are some references from the relevant documentation (SQL/CLI and
ODBC are clones of one another):

========================================================================
======

ANSI/ISO/IEC 9075-3-1999

for Information Technology

Database Language SQL

Part 3: Call-Level Interface (SQL/CLI)

Section 6.5 BindCol

Along with function SQLBindCol from the ODBC specification

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/ht
m/odbcsqlbindcol.asp

This bit should be sufficient to explain what we are after:

"BufferLength [Input]

Length of the *TargetValuePtr buffer in bytes.

The driver uses BufferLength to avoid writing past the end of the
*TargetValuePtr buffer when returning variable-length data, such as
character or binary data. Note that the driver counts the
null-termination character when returning character data to
*TargetValuePtr. *TargetValuePtr must therefore contain space for the
null-termination character or the driver will truncate the data.

When the driver returns fixed-length data, such as an integer or a date
structure, the driver ignores BufferLength and assumes the buffer is
large enough to hold the data. It is therefore important for the
application to allocate a large enough buffer for fixed-length data or
the driver will write past the end of the buffer.

SQLBindCol returns SQLSTATE HY090 (Invalid string or buffer length) when
BufferLength is less than 0 but not when BufferLength is 0. However, if
TargetType specifies a character type, an application should not set
BufferLength to 0, because ISO CLI-compliant drivers return SQLSTATE
HY090 (Invalid string or buffer length) in that case."

========================================================================
======

Now, there are times when (according to the spec) we have to defer
binding. However, this causes great problems for end user tools and
should only be done in what is basically a dire emergency.

In the case of a SELECT query that selects a fixed constant of any sort,
it would be a definite improvement for PostgreSQL to give some sort of
upper maximum.

For example:

SELECT "Eastern Division", sum(Inventory_level),
sum(Inventory_backorder), Manager_last_name FROM <table_name> WHERE
division_id = 9 GROUP BY Manager_last_name

Will return 3 columns of data. The first column is of unknown length.
Imagine if you are a spreadsheet in OpenOffice:

http://www.openoffice.org/

which happens to support ODBC connections. You would like to fill out a
report for the president of your company. Unfortunately, the first
column is of "unknown length"

That makes it a bit difficult to format this spreadsheet.

Now, I will admit that we may not know a-priori if "Eastern Division" is
character or Unicode or MBCS. But in the worst case scenario it will be
(16 + 1) * element_width bytes in length. For some Unicode character
sets, element_width can be as much as 4, so that leaves 68 octets as an
upper possible maximum.

Now, you might protest, 68 bytes might be much too large. That is true,
but I know that if I allocate 68 bytes we will not have data truncation.
It is no worse than a varchar(255) field that has a largest item 15
characters wide in it. The grid will successfully bind and we will be
able to produce the report.

Generally speaking, grids are smart enough to automatically resize
themselves to max_length(grid_column_title, grid_column_data) and so the
report will look very nice.

It is also true that it is possible for us to work around the problem.
We certainly can know the exact type information about the constants in
our queries and reformat the PostgreSQL queries to decorate them with
things like:

SELECT "Eastern Division"::char(16),
sum(Inventory_level)::Numeric(16,4), sum(Inventory_backorder)
::Numeric(16,4), Manager_last_name FROM <table_name> WHERE division_id =
9 GROUP BY Manager_last_name

But it would be very nice if the database could provide a good estimate
for us so that PostgreSQL could work like all of the other database
systems. Code full of kludges is harder to maintain.

And so I hope that we can get off on a better foot this time. If the
answer is "No, the priority for this sort of thing is low, and we do not
consider it important for our customers." Then we will have to work
around it. Hopefully, at least, it will get put into a queue of future
enhancements.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dann Corbit 2007-06-12 20:42:32 Re: Selecting a constant question
Previous Message Larry McGhaw 2007-06-12 20:40:19 Re: Selecting a constant question