Re: Proposal: Snapshot cloning

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Snapshot cloning
Date: 2007-01-26 07:19:16
Message-ID: 1169795956.3368.8.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ühel kenal päeval, N, 2007-01-25 kell 22:19, kirjutas Jan Wieck:
> Granted this one has a few open ends so far and I'd like to receive some
> constructive input on how to actually implement it.
>
> The idea is to clone an existing serializable transactions snapshot
> visibility information from one backend to another. The semantics would
> be like this:
>
> backend1: start transaction;
> backend1: set transaction isolation level serializable;
> backend1: select pg_backend_pid();
> backend1: select publish_snapshot(); -- will block
>
> backend2: start transaction;
> backend2: set transaction isolation level serializable;
> backend2: select clone_snapshot(<pid>); -- will unblock backend1
>
> backend1: select publish_snapshot();
>
> backend3: start transaction;
> backend3: set transaction isolation level serializable;
> backend3: select clone_snapshot(<pid>);
>
> ...
>
> This will allow a number of separate backends to assume the same MVCC
> visibility, so that they can query independent but the overall result
> will be according to one consistent snapshot of the database.

I see uses for this in implementing query parallelism in user level
code, like querying two child tables in two separate processes.

> What I try to accomplish with this is to widen a bottleneck, many
> current Slony users are facing. The initial copy of a database is
> currently limited to one single reader to copy a snapshot of the data
> provider. With the above functionality, several tables could be copied
> in parallel by different client threads, feeding separate backends on
> the receiving side at the same time.

I'm afraid that for most configurations this would make the copy slower,
as there will be mode random disk i/o.

Maybe better fix slony so that it allows initial copies in different
parallel transactions, or just do initial copy in several sets and merge
the sets later.

> The feature could also be used by a parallel version of pg_dump as well
> as data mining tools.
>
> The cloning process needs to make sure that the clone_snapshot() call is
> made from the same DB user in the same database as corresponding
> publish_snapshot() call was done.

Why ? Snapshot is universal and same for whole db instance, so why limit
it to same user/database ?

> Since publish_snapshot() only
> publishes the information, it gained legally and that is visible in the
> PGPROC shared memory (xmin, xmax being the crucial part here), there is
> no risk of creating a snapshot for which data might have been removed by
> vacuum already.
>
> What I am not sure about yet is what IPC method would best suit the
> transfer of the arbitrarily sized xip vector. Ideas?
>
>
> Jan
>
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Naz Gassiep 2007-01-26 07:37:13 Re: Proposal: Commit timestamp
Previous Message Jan Wieck 2007-01-26 05:41:55 Re: Proposal: Commit timestamp