Re: Can Postgres Not Do This Safely ?!?

From: Karl Pickett <karl(dot)pickett(at)gmail(dot)com>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Can Postgres Not Do This Safely ?!?
Date: 2010-10-29 13:50:04
Message-ID: AANLkTikU6iaAXX7_MWru5ZnBuJi2idyiau-5F1ZXw3_L@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

n Fri, Oct 29, 2010 at 2:53 AM, Peter Geoghegan
<peter(dot)geoghegan86(at)gmail(dot)com> wrote:
> On 29 October 2010 03:04, Karl Pickett <karl(dot)pickett(at)gmail(dot)com> wrote:
>> Hello Postgres Hackers,
>>
>> We have a simple 'event log' table that is insert only (by multiple
>> concurrent clients).  It has an integer primary key.  We want to do
>> incremental queries of this table every 5 minutes or so, i.e. "select
>> * from events where id > LAST_ID_I_GOT" to insert into a separate
>> reporting database.  The problem is, this simple approach has a race
>> that will forever skip uncommitted events.  I.e., if 5000 was
>> committed sooner than 4999, and we get 5000, we will never go back and
>> get 4999 when it finally commits.  How can we solve this?  Basically
>> it's a phantom row problem but it spans transactions.
>>
>> I looked at checking the internal 'xmin' column but the docs say that
>> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
>> value.  I don't get it.   All I want to is make sure I skip over any
>> rows that are newer than the oldest currently running transaction.
>> Has nobody else run into this before?
>
> If I understand your question correctly, you want a "gapless" PK:
>
> http://www.varlena.com/GeneralBits/130.php
> --
> Regards,
> Peter Geoghegan
>

That's interesting, but we're fine with having gaps in the range that
never appear. We also don't want to add a performance penalty for
concurrent writers. We just don't want any ids to appear (commit)
after we got a later id. To clarify, we are using a plain serial
primary key and we already have plenty of holes - that's fine. We
just want to do an incremental 'tail -f' of this giant table (along
with some joins) to feed into a reporting server every few minutes.
So we're treating it like a queue, but not deleting anything and
having absolute real-time data is not required.

It appears that theoretical options are:

1. Start a serializable transaction and wait until all earlier
transactions are gone (query pg_stat_activity or something?)
2. Ignore rows that were created later than any other in progress transactions

Both of these options assume that serials can never go backward as
they're handed out to connections / xids. I think that's safe to
assume?

Either would be fine, I just don't know if they're officially
supported by postgres.

--
Karl Pickett

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2010-10-29 13:52:33 Re: Can Postgres Not Do This Safely ?!?
Previous Message David Balažic 2010-10-29 13:40:57 9.0.1-1 windows install VC++ 2008 redistributalbe warning