Re: logical column ordering

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: logical column ordering
Date: 2015-03-12 13:44:20
Message-ID: 55019834.20208@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12.3.2015 14:17, Alvaro Herrera wrote:
> Tomas Vondra wrote:
>> On 12.3.2015 03:16, Tom Lane wrote:
>
>>> I agree though that it's worth considering defining
>>> pg_attribute.attnum as the logical column position so as to minimize
>>> the effects on client-side code. I doubt there is much stuff
>>> client-side that cares about column creation order, but there is
>>> plenty that cares about logical column order. OTOH this would
>>> introduce confusion into the backend code, since Alvaro's definition
>>> of attnum is what most of the backend should care about.
>>
>> IMHO reusing attnum for logical column order would actually make it more
>> complex, especially if we allow users to modify the logical order using
>> ALTER TABLE. Because if you change it, you have to walk through all the
>> places where it might be referenced and update those too (say, columns
>> referenced in indexes and such). Keeping attnum immutable makes this
>> much easier and simpler.
>
> I think you're misunderstanding. The suggestion, as I understand it,
> is to rename the attnum column to something else (maybe, say,
> attidnum), and rename attlognum to attnum. That preserves the
> existing property that "ORDER BY attnum" gives you the correct view
> of the table from the point of view of the user. That's very useful
> because it means clients looking at pg_attribute need less changes,
> or maybe none at all.

Hmm ... I understood it as a suggestion to drop attlognum and just
define (attnum, attphysnum).

> I think this wouldn't be too difficult to implement, because there
> aren't that many places that refer to the column-identity attribute
> by name; most of them just grab the TupleDesc->attrs array in
> whatever order is appropriate and scan that in a loop. Only a few of
> these use att->attnum inside the loop --- that's what would need to
> be changed, and it should be pretty mechanical.

I think it's way more complicated. We may fix all the pieces of the
code, but that's not all - attnum is referenced in various system views,
catalogs and such. For example pg_stats view does this:

FROM pg_statistic s JOIN pg_class c ON (c.oid = s.starelid)
JOIN pg_attribute a ON (c.oid = attrelid AND attnum = s.staattnum)
LEFT JOIN pg_namespace n ON (n.oid = c.relnamespace)
WHERE NOT attisdropped
AND has_column_privilege(c.oid, a.attnum, 'select');

information_schema also uses attnum on many places too.

I see the catalogs as a kind of public API, and redefining the meaning
of an existing column this way seems tricky, especially when we
reference it from other catalogs - I'm pretty sure there's plenty of SQL
queries in various tools that rely on this. Just google for "pg_indexes
indkeys unnest" and you'll find posts like this one from Craig:

http://stackoverflow.com/questions/18121103/how-to-get-the-index-column-orderasc-desc-nulls-first-from-postgresql

specifically tell people to do this:

SELECT
...
FROM (
SELECT
pg_class.relname,
...
unnest(pg_index.indkey) AS k
FROM pg_index
INNER JOIN pg_class ON pg_index.indexrelid = pg_class.oid
) i
...
INNER JOIN pg_attribute ON (pg_attribute.attrelid = i.indrelid
AND pg_attribute.attnum = k);

which specifically tells people to match attnum vs. indkeys. If we
redefine the meaning of attnum, and instead match indkeys against a
different column (say, attidnum), all those queries will be broken.

Which actually breaks the catalog definition as specified here:

http://www.postgresql.org/docs/devel/static/catalog-pg-index.html

which explicitly says that indkey references pg_attribute.attnum.

But maybe we don't really care about breaking this API and it is a good
approach - I need to think about it and try it.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-03-12 13:54:19 Re: Install shared libs in lib/ and bin/ with MSVC (Was: install libpq.dll in bin directory on Windows / Cygwin)
Previous Message David Fetter 2015-03-12 13:38:50 Re: OOM-killer issue when updating a inheritance table which has large number of child tables