Re: [mail] Re: Big 7.4 items - Replication

From: "Al Sutton" <al(at)alsutton(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Darren Johnson" <darren(at)up(dot)hrcoxmail(dot)com>, "Jan Wieck" <JanWieck(at)Yahoo(dot)com>, <shridhar_daithankar(at)persistent(dot)co(dot)in>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [mail] Re: Big 7.4 items - Replication
Date: 2002-12-14 18:18:10
Message-ID: 06b801c2a39d$29915ae0$0100a8c0@cloud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I see it as very difficult to avoid a two stage process because there will
be the following two parts to any transaction;

1) All databases must agree upon the acceptability of a transaction before
the client can be informed of it's success. 2) All databases must be
informed as to whether or not the transaction was accepted by the entire
replicant set, and thus whether it should be written to the database.

If stage1 is missed then the client application may be informed of a
sucessful transaction which may fail when it is replicated to other
databases.

If stage 2 is missed then databases may become out of sync because they have
accepted transactions that were rejected by other databases.

From reading the PDF on Postgres-R I can see that either one of two things
will occur;

a) There will be a central point of synchronization where conflicts will be
tested and delt with. This is not desirable because it will leave the
synchronization and replication processing load concentrated in one place
which will limit scaleability as well as leaving a single point of failure.

or

b) The Group Communication blob will consist of a number of processes which
need to talk to all of the others to interrogate them for changes which may
conflict with the current write that being handled and then issue the
transaction response. This is basically the two phase commit solution with
phases moved into the group communication process.

I can see the possibility of using solution b and having less group
communication processes than databases as attempt to simplify things, but
this would mean the loss of a number of databases if the machine running the
group communication process for the set of databases is lost.

Al.

----- Original Message -----
From: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Al Sutton" <al(at)alsutton(dot)com>
Cc: "Darren Johnson" <darren(at)up(dot)hrcoxmail(dot)com>; "Jan Wieck"
<JanWieck(at)Yahoo(dot)com>; <shridhar_daithankar(at)persistent(dot)co(dot)in>;
"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Sent: Saturday, December 14, 2002 4:59 PM
Subject: [mail] Re: [HACKERS] Big 7.4 items - Replication

>
> This sounds like two-phase commit. While it will work, it is probably
> slower than Postgres-R's method.
>
> --------------------------------------------------------------------------
-
>
> Al Sutton wrote:
> > For live replication could I propose that we consider the systems A,B,
and C
> > connected to each other independantly (i.e. A has links to B and C, B
has
> > links to A and C, and C has links to A and B), and that replication is
> > handled by the node receiving the write based transaction.
> >
> > If we consider a write transaction that arrives at A (called WT(A)),
system
> > A will then send WT(A) to systems B and C via it's direct connections.
> > System A will receive back either an OK response if there are not
conflicts,
> > a NOT_OK response if there are conflicts, or no response if the system
is
> > unavailable.
> >
> > If system A receives a NOT_OK response from any other node it begins the
> > process of rolling back the transaction from all nodes which previously
> > issued an OK, and the transaction returns a failure code to the client
which
> > submitted WT(A). The other systems (B and C) would track recent
transactions
> > and there would be a specified timeout after which the transaction is
> > considered safe and could not be rolled out.
> >
> > Any system not returning an OK or NOT_OK state is assumed to be down,
and
> > error messages are logged to state that the transaction could not be
sent to
> > the system due it it's unavailablility, and any monitoring system would
> > alter the administrator that a replicant is faulty.
> >
> > There would also need to be code developed to ensure that a system could
be
> > brought into sync with the current state of other systems within the
group
> > in order to allow new databases to be added, and faulty databases to be
> > re-entered to the group. This code could also be used for non-realtime
> > replication to allow databases to be syncronised with the live master.
> >
> > This would give a multi-master solution whereby a write transaction to
any
> > one node would guarentee that all available replicants would also hold
the
> > data once it is completed, and would also provide the code to handle
> > scenarios where non-realtime data replication is required.
> >
> > This system assumes that a majority of transactions will be sucessful
(which
> > should be the case for a well designed system).
> >
> > Comments?
> >
> > Al.
> >
> >
> >
> >
> >
> >
> > ----- Original Message -----
> > From: "Darren Johnson" <darren(at)up(dot)hrcoxmail(dot)com>
> > To: "Jan Wieck" <JanWieck(at)Yahoo(dot)com>
> > Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>;
> > <shridhar_daithankar(at)persistent(dot)co(dot)in>; "PostgreSQL-development"
> > <pgsql-hackers(at)postgresql(dot)org>
> > Sent: Saturday, December 14, 2002 1:28 AM
> > Subject: [mail] Re: [HACKERS] Big 7.4 items
> >
> >
> > > >
> > > >
> > > >>
> > > >>Lets say we have systems A, B and C. Each one has some
> > > >>changes and sends a writeset to the group communication
> > > >>system (GSC). The total order dictates WS(A), WS(B), and
> > > >>WS(C) and the writes sets are recieved in that order at
> > > >>each system. Now C gets WS(A) no conflict, gets WS(B) no
> > > >>conflict, and receives WS(C). Now C can commit WS(C) even
> > > >>before the commit messages C(A) or C(B), because there is no
> > > >>conflict.
> > > >>
> > > >
> > > >And that is IMHO not synchronous. C does not have to wait for A and B
to
> > > >finish the same tasks. If now at this very moment two new
transactions
> > > >query system A and system C (assuming A has not yet committed WS(C)
> > > >while C has), they will get different data back (thanks to
non-blocking
> > > >reads). I think this is pretty asynchronous.
> > > >
> > >
> > > So if we hold WS(C) until we receive commit messages for WS(A) and
> > > WS(B), will that meet
> > > your synchronous expectations, or do all the systems need to commit
the
> > > WS in the same order
> > > and at the same exact time.
> > >
> > > >
> > > >
> > > >It doesn't lead to inconsistencies, because the transaction on A
cannot
> > > >do something that is in conflict with the changes made by WS(C),
since
> > > >it's WS(A)2 will come back after WS(C) arrived at A and thus WS(C)
> > > >arriving at A will cause WS(A)2 to rollback (WS used synonymous to
Xact
> > > >in this context).
> > > >
> > > Right
> > >
> > > >
> > > >Hope this doesn't add too much confusion :-)
> > > >
> > > No, however I guess I need to adjust my slides to include your
> > > definition of synchronous
> > > replication. ;-)
> > >
> > > Darren
> > >
> > > >
> > >
> > >
> > >
> > > ---------------------------(end of
broadcast)---------------------------
> > > TIP 6: Have you searched our list archives?
> > >
> > > http://archives.postgresql.org
> > >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> >
> > http://archives.postgresql.org
> >
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania
19073
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Darren Johnson 2002-12-14 18:48:06 Re: [mail] Re: Big 7.4 items - Replication
Previous Message Mathieu Arnold 2002-12-14 17:03:18 Re: Big 7.4 items - Replication