Re: serial columns & loads misfeature?

From: Jason Earl <jason(dot)earl(at)simplot(dot)com>
To: Kevin Brannen <kevinb(at)nurseamerica(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: serial columns & loads misfeature?
Date: 2002-06-28 19:21:30
Message-ID: 878z4z43b9.fsf@npa01zz001.simplot.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kevin Brannen <kevinb(at)nurseamerica(dot)net> writes:

> I'm new to Postgres, so sorry if this is easy, but I did search the
> docs and could find nothing to answer this...
>
> I've got a Mysql DB that I've dumped out and am trying to insert into
> a Pg DB, as we transition. There were a few changes I had to do to
> the data, but they were easy--except for the auto_increment
> columns. :-/
>
> After I created the DB, I inserted the data (thousands of inserts) via
> psql. All went well. Then I started testing the changed code (Perl)
> and when I went to insert, I got a "dup key" error.
>
> It took me awhile to figure out what was going on, but I can recreate
> the problem with:
>
> create table test (s serial, i int);
> insert into test values (1,1);
> insert into test values (2,2);
> insert into test values (3,3);
> insert into test (i) values (4);
> ERROR: Cannot insert a duplicate key into unique index test_s_key
>
> I was expecting the system to realize new "keys" had been inserted,
> and so when the "nextval" that implicitly happens on a serial field is
> run, it would "know" that it was too small and return "max(s)+1".
> [FWIW, my expectations in this area were set by my experience with
> Informix and mysql, both do this; not sure if other RDBMs do.]
>
> I realize I can fix this by writting a Perl/DBI script to read the
> list of sequences, and do [conceptually]:
>
> get the list of sequences
> from the sequence name, retrieve the table name
> n=select max(id)+1 from table;
> select setval('seq_name',n) from seq_name;
>
> and things will be fine from here after, but surely this is a common
> enough problem after a bulk load that there is something already built
> in to handle this and I just don't have it configured correctly (or is
> this a bug?).
>
> Oh, this on a RH 7.2 system with Pg 7.1.3.
>
> TIA for any help in understanding this better!
> Kevin

First of all, since you are migrating to PostgreSQL you really might
consider trying the newest version of PostgreSQL. The 7.1 series was
good, but the 7.2 series is better.

With that out of the way here's an explanation of how PostgreSQL's
serial type works. The serial type is really nothing more than a
wrapper around a PostgreSQL sequence and a table with a default value
for the insert. However, for this default to be used you have to make
sure that you don't assign s a value manually. When you insert into
test like this:

INSERT INTO test VALUES (1,1);

Then you are deliberately short-circuiting the default value. This is
useful because there are times when you don't want the next value from
your sequence. When you insert into the table like this:

INSERT INTO test (i) VALUES (4);

You didn't specify a value for s, and so PostgreSQL uses the default,
which since you haven't used a value from the sequence is still 1.
This is almost certainly what you want your application to do once you
get it ported over to PostgreSQL.

The trick, when importing data from some other source, is to make sure
that you update the sequence so that it is set to a number higher than
the the data you imported. Using your example you only need to change
one line:

CREATE TABLE test (s serial, i int);
INSERT INTO test VALUES (1,1);
INSERT INTO test VALUES (2,2);
INSERT INTO test VALUES (3,3);
SELECT setval('test_s_seq', 4);
INSERT INTO test (i) VALUES (4);

You could even do something like:

SELECT setval('test_s_seq', (SELECT max(s) + 1 FROM test));

PostgreSQL doesn't use the indexes on aggregate functions (like max())
so it would be faster on large tables to write that as:

SELECT setval('test_s_seq', (SELECT s + 1 FROM test ORDER BY s DESC
LIMIT 1));

That's somewhat ugly, but it's much much faster. Just another little
PostgreSQL hint that is likely to come in handy :).

So the trick is to simply import your data from MySQL, reset your
sequences using setval, and then make sure that your queries do not
supply a value for your serial columns so that PostgreSQL will supply
a default value from your sequence. As long as you let PostgreSQL
fill in the default value PostgreSQL's serial type should do precisely
what you need.

Jason

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Markus Wollny 2002-06-28 19:22:46 Re: One source of constant annoyance identified
Previous Message Neil Conway 2002-06-28 19:21:21 Re: serial columns & loads misfeature?