Re: Race condition in resetting a sequence

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: Lew <lew(at)lewscanon(dot)nospam>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Race condition in resetting a sequence
Date: 2007-08-10 21:00:09
Message-ID: dcc563d10708101400i406d4e79qdcfe5bd8668d0bcc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 8/4/07, Lew <lew(at)lewscanon(dot)nospam> wrote:
> Steve Midgley writes:
> >> The code I provided to reset a primary key sequence is actually part of
> >> Ruby on Rails core library - actually they use something very similar
> >> to what I originally sent:
> ...
> >> SELECT setval('#{sequence}', (SELECT COALESCE(MAX(#{pk})+(SELECT
> >> increment_by FROM #{sequence}), (SELECT min_value FROM #{sequence}))
> >> FROM #{table}), false)
>
> Tom Lane wrote:
> > Ugh. That's completely unsafe/broken, unless they also use locking that
> > you didn't show.
> ...
> > It doesn't have a race condition "all by itself": it will do what it's
> > told. The problem with commands such as the above is that there's a
> > time window between calculating the max() and executing the setval(),
> > and that window is more than large enough to allow someone else to
> > insert a row that invalidates your max() computation. (Because of MVCC
> > snapshotting, the risk window is in fact as long as the entire
> > calculation of the max --- it's not just a few instructions as some
> > might naively think.)
> >
> > Now it is possible to make this brute-force approach safe: you can lock
> > the table against all other modifications until you've applied your own
> > changes. But you pay a high price in loss of concurrency if you do
> > that.
>
> All this trouble over semantically-significant ID columns seems to support the
> camp that excoriates use of artificial ID columns and autoincrementation
> altogether.

I don't that it does that at all. If properly implemented, sequences
provide a race free way to assign unique ids to a table that otherwise
would have no natural primary key. I think that having a religious
preference one way or the other means you might use a suboptimal
method in some cases. In the db I work on, we have artificial pks of
sequences, artificial pks made from a large list of pre-created ids (6
alphanum char locator codes, common in the travel industry) and still
other tables that have natural primary keys. Each has a reason for
being the way it is.

I can't see this saying natural pks, which are often open to
interpretation and changes over time are any less are any better than
artificial pks from a sequence at all. I can see it saying that the
people who developed RoR framework were NOT database specialists.

> The usual argument in their favor is that they speed up performance, but this
> epicyclic dance to accomodate FK references to autoincremented keys makes the
> case that there is also a performance penalty, and in the more critical
> performance area of code development and correctness than in the less critical
> search speed area.

Sorry, but that's only one of the two arguments. The other one is
that natural pks are often impossible due to the dataset not being
able to be unique because of the business constraints.

In an airline reservation system, you might start with last and first
name for a pk. Then add phone number. except some people don't wanna
give their phone numbers, so you use steet address, only to find out
that John Smith with no phone lives on 123 Acorn St in Portland while
another John Smith with no phone lives on 123 Acorn St in
Phillidelphia.

And what do you do when they call back about their ticket?

Seriously, the natural primary key method has as many problems as the
artificial one.

In the USAF, they used to use a system to keep track of your medical
records. The first letter of your last name, followed by the last
four of your SSN. So, Ken Johnson might have an SSN of 123-45-6789
While Patricia Jackson might have an SSN of 453-89-6789.

Imagine Ken's surprise when his doctor tells him he's pregnant. Or
Patricia's surprise to have a limb amputated when she went in for a
problem with heart murmers.

By the time you add enough fields together to get a reliable primary
key, you might have a multi-dimensional monster that kills
performance.

You can't just religiously say that one or the other is the only answer.

Finally the cost of software development is only a small part of the
overall cost of creating and deploying a piece of code. Last place I
worked had a transactional database developed really quickly by non-db
people that used isam tables (no, not mysql with myisam, just isam)
that corrupted records dozens of times a day and required a team of
support people to keep it running semi-smoothly and hours of downtime
each night to be cleaned up. The initial savings on development costs
were eaten up rather quickly by the ongoing support costs.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2007-08-10 21:30:14 Re: Index usage in order by with multiple columns in order-by-clause
Previous Message Fernando Hevia 2007-08-10 19:40:34 Re: Best Fit SQL query statement