From: | Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> |
---|---|
To: | Ostrovsky Eugene <e79ene(at)yandex(dot)ru>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: modification time & transaction synchronisation problem |
Date: | 2010-04-20 05:14:49 |
Message-ID: | 4BCD3849.3020304@postnewspapers.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Craig Ringer wrote:
> On 19/04/2010 4:51 PM, Craig Ringer wrote:
>
>> There might be another possible approach that uses the system
>> "xmin/xmax" fields of each tuple. That'd permit your incremental dumps
>> to be done read-only, saving you a whole lot of expensive I/O and bloat.
>> I'm just not sure what I'm thinking of will work yet.
>
> Yeah. You can use a SERIALIZABLE transaction and
> txid_current_snapshot(), which almost magically solves your problem. In
> a single call it provides all the details about active and committed
> transactions at the time of snapshot creation that you need. It even
> gives you a list of transaction IDs for uncommitted transactions between
> those ranges so old uncommitted transactions don't force you to
> repeatedly dump data. That's all you need to know to do intelligent
> incremental backup of a table.
>
> I haven't written the actual test code, but what you should need to do
> (according to my probably flawed understanding) is:
>
> - Begin a read only SERIALIZABLE transaction
Actually, it looks like READ COMMITTED is preferable, as it gives you
information about any concurrently running transactions in
txid_current_snapshot(), and you can do the actual dump in one statement
anyway.
> - Record txid_current_snapshot(), which you will need
> for the next run. We'll call the value of the last
> run's txid_current_snapshot() call 'txprev'.
> - SELECT all rows that have:
> tablename.xmin > current_snapshot_xmax(txprev) OR
> tablename.xmin IN (txid_snapshot_xip(txprev))
I've tested this approach and it appears to work fine, *but* only for
append-only tables.
I don't seem to be smart enough to figure out how to correctly handle
UPDATEs and DELETEs - I know the xmax (for last transaction in which the
row is visible) field is key, but can't figure out a way to effectively
use it to record deletions. I guess I don't understand MVCC or at least
Pg's implementation of it even as well as I thought I did (which isn't
very).
Unless someone smarter steps in or I have the time to learn more of the
details about this, I probably can't offer a pre-formed solution to your
problem.
You can always use a trigger-maintained change history table to track
inserts/updates/deletes, and DELETE FROM ... RETURNING it. That's simple
and easy, but generates plenty of extra I/O to do your progressive
backup/copy.
I'm going to stop talking to myself now.
--
Craig Ringer
From | Date | Subject | |
---|---|---|---|
Next Message | Leonardo F | 2010-04-20 06:46:58 | Re: performance problems inserting random vals in index |
Previous Message | John R Pierce | 2010-04-20 03:11:52 | Re: Int64GetDatum |