On Sat, 2009-11-07 at 11:36 +0100, Heikki Linnakangas wrote:
> Me & Simon got talking about the difficulty of doing parallel pg_dump,
> where when you open multiple connections you must somehow ensure that
> all the connections use the same snapshot so that you get a consistent
> dump. We came up with a pretty simple way to do that:
> 1. Open N+1 the connections to the server
> 2. In one of them, grab ProcArrayLock in shared mode
> 3. In all other connections, begin a (serializable) transaction.
> 4. Release ProcArrayLock.
> Because we're holding the ProcArrayLock across 2-4, all the connections
> get at step 3 will get the same snapshot. That's exactly what we want
> for doing a parallel pg_dump.
> A difficulty with that is that we need some way to hold a lwlock until
> the client tells to release it. You can't hold a lwlock over command
> boundaries. But that's surely solvable, e.g by sleeping in the backend
> with the lock held until signaled by another backend. With a timeout to
> make sure we don't block indefinitely if the client crashes or something.
How about this
* In parent session, run
synchronize_snapshots grabs ProcArrayLock and sets a ref count to N,
then waits until ref count is 0 before releasing ProcArrayLock. No need
to wait across a command.
* In N child sessions, begin serializable xact then run
which decrements the ref count.
We protect ref count using a spin lock. Ref count is given a name, so
that we can tell apart concurrent requests for synchronize_snapshots()
Simon Riggs www.2ndQuadrant.com
In response to
pgsql-hackers by date
|Next:||From: Andres Freund||Date: 2009-11-07 12:32:52|
|Subject: Uninitialized Data in WAL records generated in heap_(insert|update|delete)|
|Previous:||From: Heikki Linnakangas||Date: 2009-11-07 10:36:41|
|Subject: Getting consistent snapshot in multiple backends, for parallel pg_dump|