Skip site navigation (1) Skip section navigation (2)

database schema quality survey & postgresql pitfalls

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: database schema quality survey & postgresql pitfalls
Date: 2012-01-23 10:55:39
Message-ID: alpine.DEB.2.02.1201231152010.3160@localhost6.localdomain6 (view raw or flat)
Thread:
Lists: pgsql-hackers
Dear pgdevs,

I've just completed the final version of a survey of database schema quality in
open source software. The survey covers 512 projects which use MySQL and/or
PostgreSQL for storing their data. Automatic analyses are performed by querying
the information schema. Statistical validations are computed on the results.

Here are some findings of specific interest to the list.


About MySQL vs PostgreSQL usage:

  - MySQL is much more often used than PostgreSQL... not a surprise.

  - Even for projects which seems to support PostgreSQL, this support is often
    an afterthough and not necessarily functional. For instance, you may find
    MySQL-specific syntax in the PostgreSQL-specific script, which were clearly
    never tested.

  - Projects which use PostgreSQL statistically have a better quality compared
    to projects which use MySQL. However, the difference is mostly due to
    issues with MySQL.

  - Projects which use PostgreSQL are more often maintained than projects
    with MySQL.


Some features or default behavior of PostgreSQL seem especially error-prone:

  - SERIAL attributes seem to be considered automatically as a primary key,
    so that the primary key declaration is often forgotten.

    This suggests that:

    * the documentation should insist on the potential issue.

    * a WARNING should be displayed when SERIAL is used without an associated
      PRIMARY KEY, or possibly UNIQUE. It should be very rare to desire a
      SERIAL which is not a PK, so this is worth a warning.

  - when loading a schema definition from an SQL script, the default behavior of
    "psql" is to ignore errors and go on.

    This lead to projects with failing declarations to be ignored because the
    ERROR is lost in the flow of WARNING and NOTICE. If a table is missing,
    the error will be detected because the project is not functional, but if a
    constraint is  missing, it will just be lost. This occur in about 10% of
    pg projects!

    In order to avoid this behavior, one must do a "\set ON_ERROR_STOP 1"
    at the beginning of the script. However, this is never done. Moreover,
    there is no simple way to trigger the safer behavior from the command
    line but quite a long "-v ON_ERROR_STOP=1".

    It seems to me that:

    * the documentation should suggest to use an explicit stop on error setting
      in every script.

    * a "psql -C foo.sql" (check?) or equivalent short option would help?

    * The current client default verbosity is counter productive for quality.
      It should be reduced to WARNING and above, but should *not* include NOTICE
      which add a lot of noise ignored by the user and which hides more
      important messages. For instance, a PK implies an INDEX, a SERIAL implies
      a SEQUENCE, fine, but what is the point of telling it over and over?
      So I suggest to choose a default "client_min_messages = warning".


Finally, some issues where found and already reported some time ago about the
implementation of the information schema by PostgreSQL. For instance,
auto-generated constraint names are not unique as they should be, which
makes having a "standard" information schema a little bit pointless, as
querying it returns wrong results:-( The summary of the answer was "do not use
the information schema", or "give unique names", which does not make much sense
for me who is analysing existing projects, and as most constraint names are
generated automatically by PostgreSQL.


For those interested in more details about the survey, a preprint of the paper
is available here :

     http://www.cri.ensmp.fr/classement/doc/A-488.pdf

And the tool used for analysing the projects is available at :

     http://coelho.net/salix/

Comments are welcome.

-- 
Fabien.

pgsql-hackers by date

Next:From: C├ędric VillemainDate: 2012-01-23 10:58:56
Subject: Re: Inline Extension
Previous:From: Dimitri FontaineDate: 2012-01-23 10:53:43
Subject: Re: Inline Extension

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group