Skip site navigation (1) Skip section navigation (2)

Re: Can Postgres Not Do This Safely ?!?

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Karl Pickett <karl(dot)pickett(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Can Postgres Not Do This Safely ?!?
Date: 2010-10-29 21:57:24
Message-ID: 4CCB4344.3010602@squeakycode.net (view raw or flat)
Thread:
Lists: pgsql-general
On 10/29/2010 9:49 AM, Merlin Moncure wrote:
> On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett<karl(dot)pickett(at)gmail(dot)com>  wrote:
>> Hello Postgres Hackers,
>>
>> We have a simple 'event log' table that is insert only (by multiple
>> concurrent clients).  It has an integer primary key.  We want to do
>> incremental queries of this table every 5 minutes or so, i.e. "select
>> * from events where id>  LAST_ID_I_GOT" to insert into a separate
>> reporting database.  The problem is, this simple approach has a race
>> that will forever skip uncommitted events.  I.e., if 5000 was
>> committed sooner than 4999, and we get 5000, we will never go back and
>> get 4999 when it finally commits.  How can we solve this?  Basically
>> it's a phantom row problem but it spans transactions.
>>
>> I looked at checking the internal 'xmin' column but the docs say that
>> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
>> value.  I don't get it.   All I want to is make sure I skip over any
>> rows that are newer than the oldest currently running transaction.
>> Has nobody else run into this before?
>
> You don't have a sequence problem so much as a wrong implementation
> problem.  Sequences are always *grabbed* in order but they can hit the
> table out of order and there is a time lag between when the sequence
> value is generated and the transaction commits.  If I issue 'begin',
> insert a log record, and hold the commit for 5 minutes you are going
> to skip the record because you are only looking at the last processed
> record.  Your algorithm is going to fail if you use a sequence,
> timestamp, or gapless sequence to manage your queue position.  You
> need to divide your log records into two logical sets, procesed and
> unprocessed, and look at the set as a whole.
>
> I would suggest staging your unprocessed records to a queue table and
> having your writer consume them and move them to a processed table.
> You can also look at already built queuing implementations like PGQ
> written by our spectacularly skilled friends at Skype (haven't used it
> myself, but I've heard it's good!).
>
> merlin
>

Yep, you dont want a sequence.  You want a flag.

add a boolean "processed" flag, default it to false.

then every 5 minutes run this:

begin
insert into logged select * from events where processed = false;
update events set processed = true where processed = false;
commit;

or, if you want to select them and do something to them:

begin
select * from events where processed = false;
... do you processing on each, which would include inserting it...
update events set processed = true where processed = false;
commit;

Just make sure you do it all in the same transaction, so the update sees 
the exact same set as the select.

You could also create a function index on processed to keep track of 
just those that are false.

-Andy

In response to

Responses

pgsql-general by date

Next:From: Tom LaneDate: 2010-10-29 22:00:56
Subject: Re: Max Tables in a union
Previous:From: Guillaume LelargeDate: 2010-10-29 21:46:59
Subject: Re: Unhandled exception in PGAdmin when opening 16-million-record table

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group