| From: | "Ed L(dot)" <pgsql(at)bluepolka(dot)net> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Re: 32/64-bit transaction IDs? | 
| Date: | 2003-03-22 19:57:01 | 
| Message-ID: | 200303221257.01205.pgsql@bluepolka.net | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
On Saturday March 22 2003 12:00, Tom Lane wrote:
> "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> > On Saturday March 22 2003 11:29, Tom Lane wrote:
> >> I think this last part is wrong.  It shouldn't be using the xid as
> >> part of the ordering, only the sequence value.
> >
> > Why not?  How would you replay them on the slave in the same
> > transaction groupings and order that occurred on the master?
>
> Why not is easy:
>
> 	begin xact 1;
> 	update tuple X;
> 	...
> 					begin xact 2;
> 					update tuple Y;
> 					commit;
> 	...
> 	update tuple Y;
> 	commit;
>
> (Note that this is only possible in read-committed mode, else xact 1
> wouldn't be allowed to update tuple Y like this.)  Here, you must
> replay xact 1's update of Y after xact 2's update of Y, else you'll
> get the wrong final state on the slave.  On the other hand, it really
> does not matter whether you replay the tuple X update before or after
> you replay xact 2, because xact 2 didn't touch tuple X.
>
> If the existing DBmirror code is sorting as you say, then it will fail
> in this scenario --- unless it always manages to execute a propagation
> step in between the commits of xacts 2 and 1, which doesn't seem very
> trustworthy.
Well, I'm not absolutely certain, but I think this problem may indeed exist 
in dbmirror.  If I'm reading it correctly, dbmirror basically has the 
following:
	create table xact_queue (xid int, seqid serial, ...);
	create table tuple_queue (seqid int, data, ...);
The dbmirror trigger does this:
myXid = GetCurrentTransactionId();
insert into xact_queue (myXid, nextval(seqid_seq));
insert into tuple_queue (seqid, data, ...) values (currval(seqid_seq), ...);
The slave then grabs all queued xids in order of the max seqid within each 
transaction.  Essentially,
	SELECT xid, MAX(seqid)
	FROM xact_queue
	GROUP BY xid
	ORDER BY MAX(seqid);
In your scenario it would order them xact1, then xact2, since xact 1's 
update of Y would have the max seqid.  For each xact, it replays the tuples 
for that xact in seqid order.
	SELECT t.seqid, t.data, ...
	FROM tuple_queue t, xact_queue x
	WHERE t.seqid = x.seqid
	    AND x.xid = $XID
	ORDER BY t.seqid;
So the actual replay order would be
	xact1: update X
	xact1: update Y
	xact2: update Y
leading to slave inconsistency.
> What I'm envisioning is that you should just send updates in the order
> of their insertion sequence numbers and *not* try to force them into
> transactional grouping. ...
Very good.  Makes perfect sense to me now.  That also apparently obviates 
the need for 64-bit transactions since the sequence can be a BIGINT.
Thanks,
Ed
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Ones Self | 2003-03-22 19:58:42 | Returning an array form a function | 
| Previous Message | Joe Conway | 2003-03-22 19:12:53 | Re: table function: limit, offset, order |