Re: Can Postgres Not Do This Safely ?!?

From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Karl Pickett <karl(dot)pickett(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Can Postgres Not Do This Safely ?!?
Date: 2010-10-29 22:41:00
Message-ID: 4CCB4D7C.7000506@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 10/29/2010 07:32 AM, Karl Pickett wrote:
> On Fri, Oct 29, 2010 at 8:58 AM, Adrian Klaver<adrian(dot)klaver(at)gmail(dot)com> wrote:
>> On Thursday 28 October 2010 7:04:48 pm Karl Pickett wrote:
>>> Hello Postgres Hackers,
>>>
>>> We have a simple 'event log' table that is insert only (by multiple
>>> concurrent clients). It has an integer primary key. We want to do
>>> incremental queries of this table every 5 minutes or so, i.e. "select
>>> * from events where id> LAST_ID_I_GOT" to insert into a separate
>>> reporting database. The problem is, this simple approach has a race
>>> that will forever skip uncommitted events. I.e., if 5000 was
>>> committed sooner than 4999, and we get 5000, we will never go back and
>>> get 4999 when it finally commits. How can we solve this? Basically
>>> it's a phantom row problem but it spans transactions.
>>>
>>> I looked at checking the internal 'xmin' column but the docs say that
>>> is 32 bit, and something like 'txid_current_snapshot' returns a 64 bit
>>> value. I don't get it.
>>
>> http://www.postgresql.org/docs/8.4/interactive/functions-info.html#FUNCTIONS-TXID-SNAPSHOT-PARTS
>> "The internal transaction ID type (xid) is 32 bits wide and wraps around every 4
>> billion transactions. However, these functions export a 64-bit format that is
>> extended with an "epoch" counter so it will not wrap around during the life of
>> an installation. The data type used by these functions, txid_snapshot, stores
>> information about transaction ID visibility at a particular moment in time. Its
>> components are described in Table 9-53. "
>>
>> So:
>> Current snapshot:
>>
>> test=> SELECT txid_current_snapshot();
>> txid_current_snapshot
>> -----------------------
>> 5098:5098:
>>
>> xmin of snapshot:
>> test=> SELECT txid_snapshot_xmin(txid_current_snapshot());
>> txid_snapshot_xmin
>> --------------------
>> 5098
>> (1 row)
>
> So what happens when txid_snapshot_xmin() goes over 4 billion, and the
> table's xmin doesn't? You can't compare a 32 bit value that rolls
> over to a 64 bit that doesn't.

The long explanation is here:
http://www.postgresql.org/docs/9.0/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

The short version as I understand it is that if everything is working
correctly the XID(hence xmin) values exist in a continuous loop where 2
billion are in the past and 2 billion are in the future(assuming default
settings). At some point the old values are frozen i.e. replaced with a
special FrozenXID. This would mean that the *snapshot functions should
only return currently valid xmins. Since I have never rolled over a
database I can only speak to theory as I understand it.

>
>>
>>
>>> All I want to is make sure I skip over any
>>> rows that are newer than the oldest currently running transaction.
>>> Has nobody else run into this before?
>>>
>>> Thank you very much.
>>>
>>> --
>>> Karl Pickett
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)gmail(dot)com
>>
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Richard Broersma 2010-10-29 23:03:59 Call For Talks: PGDay LA @ SCALE 9X
Previous Message Mike Christensen 2010-10-29 22:36:27 pgAdmin on Mac connecting to Postgres 9 on Linux - SSL/timeout issue