Synchronized snapshots versus multiple databases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Synchronized snapshots versus multiple databases
Date: 2011-10-21 15:36:37
Message-ID: 10919.1319211397@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've thought of another nasty problem for the sync-snapshots patch.
Consider the following sequence of events:

1. Transaction A, which is about to export a snapshot, is running in
database X.
2. Transaction B is making some changes in database Y.
3. A takes and exports a snapshot showing B's xid as running.
4. Transaction B ends.
5. Autovacuum launches in database Y. It sees nothing running in Y,
so it decides it can vacuum dead rows right up to nextXid, including
anything B deleted.
6. Transaction C starts in database Y, and imports the snapshot from A.
Now it thinks it can see rows deleted by B ... but vacuum is busy
removing them, or maybe already finished doing so.

The problem here is that A's xmin is ignored by GetOldestXmin when
calculating cutoff XIDs for non-shared tables in database Y, so it
doesn't protect would-be adoptees of the exported snapshot.

I can see a few alternatives, none of them very pleasant:

1. Restrict exported snapshots to be loaded only by transactions running
in the same database as the exporter. This would fix the problem, but
it cuts out one of the main use-cases for sync snapshots, namely getting
cluster-wide-consistent dumps in pg_dumpall.

2. Allow a snapshot exported from another database to be loaded so long
as this doesn't cause the DB-local value of GetOldestXmin to go
backwards. However, in scenarios such as the above, C is certain to
fail such a test. To make it work, pg_dumpall would have to start
"advance guard" transactions in each database before it takes the
intended-to-be-shared snapshot, and probably even wait for these to be
oldest. Ick.

3. Remove the optimization that lets GetOldestXmin ignore XIDs outside
the current database. This sounds bad, but OTOH I don't think there's
ever been any proof that this optimization is worth much in real-world
usage. We've already had to lobotomize that optimization for walsender
processes, anyway.

4. Somehow mark the xmin of a process that has exported a snapshot so
that it will be honored in all DBs not just the current one. The
difficulty here is that we'd need to know *at the time the snap is
taken* that it's going to be exported. (Consider the scenario above,
except that A doesn't get around to exporting the snapshot it took in
step 3 until between steps 5 and 6. If the xmin wasn't already marked
as globally applicable when vacuum looked at it in step 5, we lose.)
This is do-able but it will contort the user-visible API of the sync
snapshots feature. One way we could do it is to require that
transactions that want to export snapshots set a transaction mode
before they take their first snapshot.

Thoughts, better ideas?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2011-10-21 15:45:52 Re: [PATCH] Log crashed backend's query v3
Previous Message Kohei KaiGai 2011-10-21 14:53:33 Re: WIP: Join push-down for foreign tables