Re: change natural column order

From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Dann Corbit" <DCorbit(at)connx(dot)com>
Cc: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>, "Greg Stark" <gsstark(at)mit(dot)edu>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: change natural column order
Date: 2004-12-01 04:58:10
Message-ID: 87mzwy7gyl.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


"Dann Corbit" <DCorbit(at)connx(dot)com> writes:

> Putting "SELECT * FROM <table_name>" into a compiled program using libpq or
> ESQL is a code defect. Period.
>
> ALTER TABLE ADD COLUMN /* Most frequent defect maker for SELECT * */

Whatever are you talking about? I've never tried ESQL precisely because it
seems a terribly old-fashioned interface. Does it not support referring to
columns by name? Even in libpq you can use PQfnumber or load all the columns
found with PQfname into an associative array like higher level language
drivers typically do automatically.

> DROP TABLE/CREATE TABLE /* New version may have the same name and the
> same number of columns, and they may even have the same data types but there is
> no guarantee that the meaning is the same. */

Um, well, there was no guarantee the meaning was the intended meaning in the
first place except for your own interface documentation. Obviously if you
replace the table with a new one you're either maintaining compatible
semantics or else you're changing the interface and it will require some code
changes.

> It is a defect of equal magnitude to assume that columns are returned in any
> particular order unless specified in a column list (again, from a program and
> not interactively).

Actually the spec does guarantee that the columns have a fixed defined
ordering. However I would agree it would be a poor design to depend on that
ordering since it's not self-documenting and requires close synchronization
between far distant pieces of code.

But then I think it's bad to depend on ordering even when it is an explicitly
listed column list because it requires close synchronization between two
pieces of code even if they aren't too far distant. I prefer referring to
columns by name in all but the simplest cases because it means the only
synchronization is the presence or lack of a column, not the precise position
in the output list. I'm free to add columns to a select list in the logical
position without having to make adjustments elsewhere in the code.

> Another typical defect is to assume that columns come backed ordered by the
> primary key if the table is clustered on the primary key column. You can
> have a page split with many database systems and so there is no guarantee
> that data will be returned in order without an ORDER BY clause -- clustered
> or not.

You're confusing columns with rows. Without an ORDER BY clause there's no
guarantee of the order of the rows. Not because of page splits or any other
specific technical reason. There just isn't. There are any number of reasons
the database might choose another ordering to return rows.

In the case of Postgres clustering doesn't really "work" anyways so *any*
sequential table scan without an explicit ORDER BY will be out of order unless
you haven't made any data changes since the clustering.

> Any of (ASSUME NO COLUMN CHANGES/ASSUME COLUMN ORDER/ASSUME CLUSTERED KEY SORT
> ORDER) would cause me to fail code in a code review.
>
> IMO-YMMV

Well with Postgres and reasonably modern drivers none of the above impact at
all on whether "SELECT *" is a good idea or not.

Like I said in my post. Because of historical problems with various other
databases the rule of thumb that "SELECT *" is bad has become entrenched in
standard DBA dogma.

One of those historical problems is that some interfaces depend on column
ordering in result sets and have particular problems dealing when the columns
change in any way. As you point out this can happen for lots of reasons, not
just because "SELECT *" is used. Thankfully nowadays we get to use much more
flexible interfaces that find columns based on column names that don't suffer
from these problems.

--
greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jan Wieck 2004-12-01 05:08:36 Re: [ANNOUNCE] USENET vs Mailing Lists Poll ...
Previous Message Tom Lane 2004-12-01 04:56:32 Re: Upcoming Changes to News Server ...