Re: modification time & transaction synchronisation problem

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Ostrovsky Eugene <e79ene(at)yandex(dot)ru>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: modification time & transaction synchronisation problem
Date: 2010-04-20 05:14:49
Message-ID: 4BCD3849.3020304@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Craig Ringer wrote:
> On 19/04/2010 4:51 PM, Craig Ringer wrote:
>
>> There might be another possible approach that uses the system
>> "xmin/xmax" fields of each tuple. That'd permit your incremental dumps
>> to be done read-only, saving you a whole lot of expensive I/O and bloat.
>> I'm just not sure what I'm thinking of will work yet.
>
> Yeah. You can use a SERIALIZABLE transaction and
> txid_current_snapshot(), which almost magically solves your problem. In
> a single call it provides all the details about active and committed
> transactions at the time of snapshot creation that you need. It even
> gives you a list of transaction IDs for uncommitted transactions between
> those ranges so old uncommitted transactions don't force you to
> repeatedly dump data. That's all you need to know to do intelligent
> incremental backup of a table.
>
> I haven't written the actual test code, but what you should need to do
> (according to my probably flawed understanding) is:
>
> - Begin a read only SERIALIZABLE transaction

Actually, it looks like READ COMMITTED is preferable, as it gives you
information about any concurrently running transactions in
txid_current_snapshot(), and you can do the actual dump in one statement
anyway.

> - Record txid_current_snapshot(), which you will need
> for the next run. We'll call the value of the last
> run's txid_current_snapshot() call 'txprev'.
> - SELECT all rows that have:
> tablename.xmin > current_snapshot_xmax(txprev) OR
> tablename.xmin IN (txid_snapshot_xip(txprev))

I've tested this approach and it appears to work fine, *but* only for
append-only tables.

I don't seem to be smart enough to figure out how to correctly handle
UPDATEs and DELETEs - I know the xmax (for last transaction in which the
row is visible) field is key, but can't figure out a way to effectively
use it to record deletions. I guess I don't understand MVCC or at least
Pg's implementation of it even as well as I thought I did (which isn't
very).

Unless someone smarter steps in or I have the time to learn more of the
details about this, I probably can't offer a pre-formed solution to your
problem.

You can always use a trigger-maintained change history table to track
inserts/updates/deletes, and DELETE FROM ... RETURNING it. That's simple
and easy, but generates plenty of extra I/O to do your progressive
backup/copy.

I'm going to stop talking to myself now.

--
Craig Ringer

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Leonardo F 2010-04-20 06:46:58 Re: performance problems inserting random vals in index
Previous Message John R Pierce 2010-04-20 03:11:52 Re: Int64GetDatum