Automatic error rollback can be harmful

From: "Wayne Armstrong" <wdarmst(at)bacchus(dot)com(dot)au>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Automatic error rollback can be harmful
Date: 2003-05-24 14:49:46
Message-ID: 200305241449.h4OEnmbv002338@mail.bacchus.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,
I have been using postgresql for about 3 months now (after about 15 years of
using various commercial databases).
I love it :) but - (and you just knew that but was coming :) there is one thing
that really causes me grief
.
It is the tendency for postgres to rollback automatically on some errors.

What this leads to is the need for an application to commit more frequently
tahn is really desirable.

Example1.
During import of 120 thousand records from an isam file system, 3 say records
fail integrity checks ( files in non-database systems tend not to have
referential integrity implemented on them except at an application level). The
desired result is to drop the records failing integrity checks. Importing into
db2 or oracle say - I have the option to ignore the referential integrity
errors (i get the error code on the insert anyway), and continue with the
insert of the 120 thousand - 3 records. In postgres, I either have to commit
after every record write, or guarantee the "cleanness" of the data before I
begin the import - which is sometimes difficult given the data sources I may be
importing from. Worse if the intention behind the automatic rollback is to
guarantee data purity (from the postgresql manual 10.4.1 -
" Turn off autocommit and just do one commit at the end. (In plain SQL, this
means issuing BEGIN at the start and COMMIT at the end. Some client libraries
may do this behind your back, in which case you need to make sure the library
does it when you want it done.) If you allow each insertion to be committed
separately, PostgreSQL is doing a lot of work for each record added. An
additional benefit of doing all insertions in one transaction is that if the
insertion of one record were to fail then the insertion of all records inserted
up to that point would be rolled back, so you won't be stuck with partially
loaded data."
It actually makes more of a mess than leaving if or not to rollback under
application control. If in this example the three "bad" records are scattered
throughout the dataset, I could end up (if I ignore the error as i would do
for most other dbms), with a random number of records in the table. etc etc.
Of course also, to do robust imports of the insert ifError update style
avaiilable in most other (commercial) dbms
it is also nescessary to issue a commit before each insert/update attempt.

Example2 - ODBC driver rollbacks :-

Here is a snippet from a postgres log :-
2003-05-24 23:09:14 [1525] LOG: duration: 0.134503 sec
2003-05-24 23:09:14 [1525] LOG: query: select nspname from pg_namespace n, p
_class c where c.relnamespace=n.oid and c.oid='select'::regclass
2003-05-24 23:09:14 [1525] ERROR: Relation "select" does not exist
2003-05-24 23:09:14 [1525] LOG: statement: select nspname from pg_namespace
, pg_class c where c.relnamespace=n.oid and c.oid='select'::regclass
2003-05-24 23:09:14 [1525] LOG: query: ROLLBACK
2003-05-24 23:09:14 [1525] LOG: duration: 0.000538 sec

The rollback here is generated by the odbc driver to clear an error created by
the odbc driver incorrectly parsing the select from a subselect statement as a
table name. The application is totally unaware an error has occured and a
rollback has been issued. This is most likely to lead to data loss, unless,
basically, an application using odbc is in autocommit mode or commits after
every sql statement issued. This concerns me to the piont where I would really
recommend not using the parse statement option in the odbc driver (even though
that reduces the odbc drivers ability to mimic a prepare which is also
problematic :) unless you are using autocommit.

Thoughts and comments ?

Regards,
Wayne Armstorng
Bacchus Management Systems
http://www.bacchus.com.au

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Eric D Nielsen 2003-05-24 15:56:17 Query error=Implicit ABORT?
Previous Message Martin Marques 2003-05-24 13:50:17 Re: Postgresql on SUN Server