Exporting Snapshots

From: Markus Wanner <markus(at)bluegap(dot)ch>
To: pgsql-cluster-hackers(at)postgresql(dot)org
Cc: Joachim Wieland <joe(at)mcknight(dot)de>
Subject: Exporting Snapshots
Date: 2010-02-06 07:50:38
Message-ID: 4B6D1F4E.7070104@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-cluster-hackers

Hi,

the very first item on the ClusterFeatures [1] wishlist is "Export
snapshots to other sessions". Joachim Wieland has recently sent in a
patch to hackers [1] which he called "Synchronized Snapshots". To me
that sounded similar enough to review it.

That patch doesn't really "export" a snapshot, but rather just tries to
make sure the transactions start with the same snapshot. They can then
do whatever they want, including writing and committing or aborting
whenever they want.

But for any kind of parallel querying (be it on the same or across
multiple nodes) we need to be able to export a snapshot of a transaction
to another backend - from any point in time of the origin transaction.

This includes the full XIP array (list of transactions in progress at
the time of snapshot creation) as well as making sure the data that's
already written (but uncommitted) by that transaction is available to
the destination backend (which is a no-op for a single node, but needs
care for remote backends).

Additionally, some access controlling information needs to be
transferred, to ensure parallel querying isn't a security hole.
Joachim's patch currently circumvents this issue by requiring superuser
privileges.

A worker backend for parallel querying should never need to write any
data, so it should be forced into read-only mode. And I'd say the origin
transaction should not be allowed to continue with another query before
having "collected" all worker backends that attached to its snapshot. So
we have yet another difference to Joachim's approach: continuing
independently or being bound to the origin transaction.

I realize this is not quite the same as what Joachim has in mind for
parallel pg_dump. It seems to be a more general approach, which
certainly also requires more work. However, I think it could fit the
requirements of a parallel pg_dump as well.

Cluster hackers, is this a good summary which covers your needs as well?
Something that's still missing?

Joachim, would you be willing to work on such a more general approach?

Regards

Markus Wanner

[1]: feature wish list of cluster hackers:
http://wiki.postgresql.org/wiki/ClusterFeatures

[2]: Synchronized Snapshots, by Joachim Wieland
http://archives.postgresql.org/message-id/dc7b844e1001081136k12ae4eq6d1f7689ed1adfe6@mail.gmail.com

Responses

Browse pgsql-cluster-hackers by date

  From Date Subject
Next Message Markus Wanner 2010-02-06 08:08:45 Re: PgCon: who will be there?
Previous Message Greg Smith 2010-02-06 07:42:34 Re: PgCon: who will be there?