Re: 7.3 gotchas for applications and client libraries

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-interfaces(at)postgresql(dot)org
Subject: Re: 7.3 gotchas for applications and client libraries
Date: 2002-09-18 05:18:07
Message-ID: 200209180518.g8I5I7r01678@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-interfaces


I have copied Tom's fine email to:

http://www.ca.postgresql.org/docs/momjian/upgrade_7.3

and have added a mention of it in the HISTORY file:

A dump/restore using "pg_dump" is required for those wishing to migrate
data from any previous release. A summary of changes needed in client
applications is at http://www.ca.postgresql.org/docs/momjian/upgrade_7.3.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce suggested that we need a porting guide to help people look for
> application and client-library code that will be broken by the changes
> in PG 7.3. Here is a first cut at documenting the issues.
> Comments welcome --- in particular, what have I missed?
>
> regards, tom lane
>
>
> Revising client-side code for PG 7.3 system catalogs
>
>
> Here are some notes about things to look out for in updating client-side
> code for PG 7.3. Almost anything that looks at the system catalogs is
> probably going to need work, if you want it to behave reasonably when you
> start using 7.3's new features such as schemas and DROP COLUMN.
>
> As an example, consider the task of listing the names and datatypes for
> a table named "foo". In the past you may have done this with a query like
>
> SELECT a.attname, format_type(a.atttypid, a.atttypmod)
> FROM pg_class c, pg_attribute a
> WHERE c.relname = 'foo'
> AND a.attnum > 0 AND a.attrelid = c.oid
> ORDER BY a.attnum
>
> (this in fact is exactly what 7.2 psql uses to implement "\d foo").
> This query will work perfectly well in 7.2 or 7.1, but it's broken in half
> a dozen ways for 7.3.
>
> The biggest problem is that with the addition of schemas, there might be
> several tables named "foo" listed in pg_class. The old query will produce
> a list of all of their attributes mixed together. For example, after
> create schema a;
> create schema b;
> create table a.foo (f1 int, f2 text);
> create table b.foo (f1 text, f2 numeric(10,1));
> we'd get:
>
> attname | format_type
> ---------+---------------
> f1 | text
> f1 | integer
> f2 | text
> f2 | numeric(10,1)
> (4 rows)
>
> Not good. We need to decide exactly which foo we want, and restrict the
> query to find only that row in pg_class. There are a couple of ways to
> do this, depending on how fancy you want to get.
>
> If you just want to handle an unqualified table name "foo", and find the
> same foo that would be found if you said "select * from foo", then one way
> to do it is to restrict the query to "visible" rows of pg_class:
>
> SELECT ...
> FROM ...
> WHERE c.relname = 'foo' AND pg_table_is_visible(c.oid)
> AND ...
>
> pg_table_is_visible() will only return true for pg_class rows that are in
> your current search path and are not hidden by similarly-named tables that
> are in earlier schemas of the search path.
>
> An alternative way is to eliminate the explicit join to pg_class, and
> instead use the new datatype "regclass" to look up the correct pg_class
> OID:
>
> SELECT ...
> FROM pg_attribute a
> WHERE a.attrelid = 'foo'::regclass
> AND a.attnum > 0
> ORDER BY a.attnum
>
> The regclass input converter looks up the given string as a table name
> (obeying schema visibility rules) and produces an OID constant that you
> can compare directly to attrelid. This is more efficient than doing
> the join, but there are a couple of things to note about it. One is
> that if there isn't any "foo" table, you'll get an ERROR message from
> the regclass input converter, whereas with the old query you got zero
> rows out and no error message. You might or might not prefer the old
> behavior. Another limitation is that there isn't any way to adapt
> this approach to search for a partially-specified table name;
> whereas in the original query you could use a LIKE or regex pattern to
> match the table name, not only a simple equality test.
>
> Now, what if you'd like to be able to specify a qualified table name
> --- that is, show the attributes of "a.foo" or "b.foo" on demand?
> It will not work to say
> WHERE c.relname = 'a.foo'
> so this is another way in which the original query fails for 7.3.
>
> It turns out that the regclass method will work for this: if you say
> WHERE a.attrelid = 'a.foo'::regclass
> then the right things happen.
>
> If you don't want to use regclass then you're going to have to do an
> explicit join against pg_namespace to find out which foo you want:
>
> SELECT a.attname, format_type(a.atttypid, a.atttypmod)
> FROM pg_namespace n, pg_class c, pg_attribute a
> WHERE n.nspname = 'a' AND c.relname = 'foo'
> AND c.relnamespace = n.oid
> AND a.attnum > 0 AND a.attrelid = c.oid
> ORDER BY a.attnum
>
> This is somewhat tedious because you have to be prepared to split the
> qualified name into its components on the client side. An advantage
> is that once you've done that, you can again consider using LIKE or
> regex patterns instead of simple name equality. This is essentially
> what 7.3 psql does to support wildcard patterns like "\dt a*.f*".
>
> Okay, I think we've about beaten the issue of "which foo do you want"
> to death. But what other ways are there for the schema feature to cause
> trouble in this apparently now well-fixed-up query?
>
> One way is that the system catalogs themselves live in a schema, and
> if that schema isn't frontmost in your search path then your references
> to pg_class and so forth might find the wrong tables. (It's legal now
> for ordinary users to create tables named like "pg_xxx", so long as they
> don't try to put 'em in the system's schema.) This is probably not a
> big issue for standalone applications, which can assume they know what the
> search path is. But in client-side libraries, psql, and similar code
> that has to be able to deal with someone else's choice of search path,
> we really ought to make the references to system catalogs be fully
> qualified:
>
> SELECT ...
> FROM pg_catalog.pg_namespace n, pg_catalog.pg_class c,
> pg_catalog.pg_attribute a
> WHERE ...
>
> (If you weren't using table aliases in your queries before, here is a good
> place to start...)
>
> In fact it's worse than that: function names, type names, etc also live in
> schemas. So you really ought to qualify references to built-in functions
> and types too:
>
> SELECT ..., pg_catalog.format_type(...) ...
>
> WHERE a.attrelid = 'foo'::pg_catalog.regclass ...
>
> The truly paranoid might want to qualify their operator names too,
> though I draw the line at this because of the horribly ugly syntax needed:
>
> WHERE a.attrelid OPERATOR(pg_catalog.=) 'foo'::pg_catalog.regclass
>
> There's another, non-schema-related, gotcha in this apparently simple task
> of showing attribute names: in 7.3 you need to exclude dropped attributes
> from your display. ALTER TABLE DROP COLUMN doesn't remove the
> pg_attribute entry for the dropped column, it only changes it to have
> attisdropped = true. So you will typically want to add
>
> WHERE NOT attisdropped
>
> when looking at pg_attribute.
>
> Note however that excluding dropped columns like this means there may be
> gaps in the series of attnum values you see. That doesn't bother this
> particular query, but it could easily confuse applications that expect the
> attributes to have consecutive attnums 1 to N. For example, pg_dump makes
> an array of attributes and wants to index into the array with attnums.
> It proved easier to make pg_dump include dropped attributes in its array
> (and filter them out later) than to change the indexing logic.
>
> If you have client-side code that looks in pg_proc, pg_type, or
> pg_operator then exactly the same sorts of schema-related issues appear:
> the name alone is no longer unique, you have to think about identifying
> the function, type, or operator within the schema you want.
>
> That's about all the mileage I can get out of the "show a table's
> attributes" example, but there are still more schema-related trouble
> items to check for.
>
> One problem to look for is code that scans pg_class (or another system
> table, but most commonly pg_class) to make a list of things to operate
> on. For example, various people have built scripts to automatically
> reindex every table in a database. Such code will fail as soon as you
> start using schemas, because it will find tables that aren't in your
> current schema search path and try to operate on them. Depending on what
> you want to do, you could change the code to emit fully qualified names
> of tables it wants to operate on (so it will work no matter which schema
> they are in), or you could restrict the pg_class search to find only
> visible tables.
>
> If you want to use qualified names, the straightforward way to do it
> is to join against pg_namespace to get the schema name:
>
> SELECT nspname, relname FROM pg_class c, pg_namespace n
> WHERE relnamespace = n.oid AND relname LIKE 'foo%' AND ...
>
> A less obvious way is to use the regclass output converter:
>
> SELECT c.oid::regclass FROM pg_class c
> WHERE relname LIKE 'foo%' AND ...
>
> This will give you back a table name that is qualified only if it needs to
> be (i.e., the table is not visible in your search path), so you can use
> it directly in the command you want to give next. Another interesting
> property of the regclass converter is that it will double-quote the name
> correctly if necessary --- for example, you'll get "TEST" (with the
> quotes) if the table is named TEST. So you can splice the name directly
> into a SQL command without any special pushups and be confident that it
> will produce the right results.
>
> BTW, there are similar output converters for type, function, and operator
> names, if you need them.
>
> Another thing to look for is code that tries to exclude system tables by
> excluding tablenames beginning with "pg_"; typical code is like
> WHERE relname NOT LIKE 'pg\\_%'
> or
> WHERE relname !~ '^pg_'
> This is not the preferred method anymore: the right way to do this is to
> join against pg_namespace and exclude tables that live in schemas whose
> names begin with "pg_".
>
> A related point is that temporary tables no longer have names (in the
> catalogs) of the form "pg_temp_NNN"; rather they have exactly the name
> that the creating user gave them. They are kept separate from other
> tables by placing them in schemas named like "pg_temp_NNN" (where now
> NNN identifies an active backend, not a single table). So if you wanted
> your scan to exclude temp tables then you'd definitely better change to
> excluding on the basis of schema name not table name. On the upside,
> if you do want your scan to show temp tables then it's much easier than
> before. (BTW, the pg_table_is_visible function is the best way of
> distinguishing your own session's temp tables from other people's. Yours
> will be visible, other people's won't.)
>
> Other things that are less likely to concern most applications, but could
> break some:
>
> Aggregate functions now have entries in pg_proc; pg_aggregate has lost
> most of its columns and now is just an extension of a pg_proc entry.
> If you have code that knows the difference between a plain function and
> an aggregate function then it will surely need work.
>
> pg_class.relkind has a new possible value, 'c' for a composite type.
>
> pg_type.typtype has two new possible values, 'd' for a domain and 'p' for
> a pseudo-type.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-09-18 05:27:26 Re: Numeric casting rules, take two
Previous Message Bruce Momjian 2002-09-18 05:10:02 Open 7.3 items

Browse pgsql-interfaces by date

  From Date Subject
Next Message Gerhard Hintermayer 2002-09-19 10:27:05 pg_exec not returning after 16 calls/ libpgtcl
Previous Message Steven O'Toole 2002-09-14 23:36:53 Re: Connecting R to PostgreSQL