Batch insert order and current_timestamp

From: Agustin CS <agustincs1(at)gmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Batch insert order and current_timestamp
Date: 2009-03-06 11:59:17
Message-ID: 9fb413bd0903060359j6659f02bmab90b2eb6b1160f6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Hi all

I'm getting a strange issue with row inserts using
PreparedStatement/Statement batches.

I'm trying to use a table as a queue, so lower levels receive data from
devices (changes in their attributes)
and add those changes into the table (one row per change) using batches.
>From upper levels another thread each several seconds wakes up and checks
for rows in this table.
If some of them are available it reads a certain number (for example 100)
and after that those 100 are removed,
so next time upper level will try to read other 100, and so on.

The table looks like this:
ID sequence self-increment (inc = +1)
timestamp
otherFields...

The insert query used for the batch at the PreparedStatement looks like
this:
insert into table (timestamp,otherFields) values(current_timestamp,...) so
the default value (nextval(sequence))
is used for the ID column.

I was expecting to have a global order only by using this ID as order key,
but according to an answer from
people of #postgresql(at)freenode(dot)org it seems that's not enough. It seems two
consecutive inserts can get
disordered, that is, the first one could have a lower ID but a newer
timestamp (assuming that have been
inserted from different batches, for example if lower level includes
multiple threads inserting rows using their
own batches).

Considering that I can see all rows inserted in the same batch having the
same timestamp, the right way
to select rows from the upper level should be ORDER BY timestamp,ID.

Anyway, if this "disorder" can really happen then the global ordering is
lost. For example:
- INSERT 50 rows from lower levels from low-thread 1
- SELECT FROM UPPER LEVELS
- INSERT 50 rows from lower levels from low-thread 2

Because of this problem insert1-row50 may get a lower ID, but a newer
timestamp than insert2-row1.

Then if the upper level is extracting 50 rows each time and launches the
query right in the middle of those
inserts it will get a wrong "global order", even if it's using the ORDER BY
timestamp,ID. That is, each
select is locally ordered (it's guaranteed by the ORDER clause) but there's
a disorder between 2 consecutive
selects.

I'm asking about this because i've found exactly this problem, two rows in
the table where the first one has lower
id but newer timestamp. Right now assuming that each one would be part of
different selects is a situation
forced by me teorically, because i think there is a possible problem there.

To summarize, the questions are:
- if i launch a batch having row1, row2 and row3 (inserted in this order
into the batch) in a table with a sequence PK
and another timestamp column (with current_timestamp as value) can i be sure
that lower id means "not newer" timestamp?
- in the table will i allways get the rows ordered in this way related to
the ID? i mean, row1 will allways have lower ID
than row2 and row3, and so on?
- what about locking the table for inserts? it may be the last way to do
this, but depending on the concrete scenario
maybe it would be a good solution.

Thx in advance and regards

Browse pgsql-jdbc by date

  From Date Subject
Next Message Александър Шопов 2009-03-06 12:25:20 Re: Documentation for working with PG interval types via JDBC
Previous Message Dave Cramer 2009-03-06 11:40:51 Re: Forcing postgres NOT to use sequential scan, trough JDBC