Re: Can Postgres Not Do This Safely ?!?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Karl Pickett <karl(dot)pickett(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Can Postgres Not Do This Safely ?!?
Date: 2010-10-29 14:49:14
Message-ID: AANLkTinNbThWs_jVRXS3Fcvjk0qU9Non_71E8MHn2Bpq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Oct 28, 2010 at 10:04 PM, Karl Pickett <karl(dot)pickett(at)gmail(dot)com> wrote:
> Hello Postgres Hackers,
>
> We have a simple 'event log' table that is insert only (by multiple
> concurrent clients).  It has an integer primary key.  We want to do
> incremental queries of this table every 5 minutes or so, i.e. "select
> * from events where id > LAST_ID_I_GOT" to insert into a separate
> reporting database.  The problem is, this simple approach has a race
> that will forever skip uncommitted events.  I.e., if 5000 was
> committed sooner than 4999, and we get 5000, we will never go back and
> get 4999 when it finally commits.  How can we solve this?  Basically
> it's a phantom row problem but it spans transactions.
>
> I looked at checking the internal 'xmin' column but the docs say that
> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
> value.  I don't get it.   All I want to is make sure I skip over any
> rows that are newer than the oldest currently running transaction.
> Has nobody else run into this before?

You don't have a sequence problem so much as a wrong implementation
problem. Sequences are always *grabbed* in order but they can hit the
table out of order and there is a time lag between when the sequence
value is generated and the transaction commits. If I issue 'begin',
insert a log record, and hold the commit for 5 minutes you are going
to skip the record because you are only looking at the last processed
record. Your algorithm is going to fail if you use a sequence,
timestamp, or gapless sequence to manage your queue position. You
need to divide your log records into two logical sets, procesed and
unprocessed, and look at the set as a whole.

I would suggest staging your unprocessed records to a queue table and
having your writer consume them and move them to a processed table.
You can also look at already built queuing implementations like PGQ
written by our spectacularly skilled friends at Skype (haven't used it
myself, but I've heard it's good!).

merlin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jan C. 2010-10-29 14:58:28 Re: pg_restore -t table doesn't restore PKEY
Previous Message Karl Pickett 2010-10-29 14:32:12 Re: Can Postgres Not Do This Safely ?!?