Re: Re: AW: Re: MySQL and BerkleyDB (fwd)

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>, "'dom(at)idealx(dot)com'" <dom(at)idealx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: AW: Re: MySQL and BerkleyDB (fwd)
Date: 2001-01-25 03:36:58
Message-ID: 200101250336.WAA00898@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Added to TODO.detail/replication.

> [ Charset ISO-8859-1 unsupported, converting... ]
> > > I had thought that the pre-commit information could be stored in an
> > > auxiliary table by the middleware program ; we would then have
> > > to re-implement some sort of higher-level WAL (I thought of the list
> > > of the commands performed in the current transaction, with a sequence
> > > number for each of them that would guarantee correct ordering between
> > > concurrent transactions in case of a REDO). But I fear I am missing
> >
> > This wouldn't work for READ COMMITTED isolation level.
> > But why do you want to log commands into WAL where each modification
> > is already logged in, hm, correct order?
> > Well, it has sense if you're looking for async replication but
> > you need not in two-phase commit for this and should aware about
> > problems with READ COMMITTED isolevel.
> >
>
> I believe the issue here is that while SERIALIZABLE ISOLATION means all
> queries can be run serially, our default is READ COMMITTED, meaning that
> open transactions see committed transactions, even if the transaction
> committed after our transaction started. (FYI, see my chapter on
> transactions for help, http://www.postgresql.org/docs/awbook.html.)
>
> To do higher-level WAL, you would have to record not only the queries,
> but the other queries that were committed at the start of each command
> in your transaction.
>
> Ideally, you could number every commit by its XID your log, and then
> when processing the query, pass the "committed" transaction ids that
> were visible at the time each command began.
>
> In other words, you can replay the queries in transaction commit order,
> except that you have to have some transactions committed at specific
> points while other transactions are open, i.e.:
>
> XID Open XIDS Query
> 500 UPDATE t SET col = 3;
> 501 500 BEGIN;
> 501 500 UPDATE t SET col = 4;
> 501 UPDATE t SET col = 5;
> 501 COMMIT;
>
> This is a silly example, but it shows that 500 must commit after the
> first command in transaction 501, but before the second command in the
> transaction. This is because UPDATE t SET col = 5 actually sees the
> changes made by transaction 500 in READ COMMITTED isolation level.
>
> I am not advocating this. I think WAL is a better choice. I just
> wanted to outline how replaying the queries in commit order is
> insufficient.
>
> > Back to two-phase commit - it's easiest part of work required for
> > distributed transaction processing.
> > Currently we place single commit record to log and transaction is
> > committed when this record (and so all other transaction records)
> > is on disk.
> > Two-phase commit:
> >
> > 1. For 1st phase we'll place into log "prepared-to-commit" record
> > and this phase will be accomplished after record is flushed on disk.
> > At this point transaction may be committed at any time because of
> > all its modifications are logged. But it still may be rolled back
> > if this phase failed on other sites of distributed system.
> >
> > 2. When all sites are prepared to commit we'll place "committed"
> > record into log. No need to flush it because of in the event of
> > crash for all "prepared" transactions recoverer will have to
> > communicate other sites to know their statuses anyway.
> >
> > That's all! It is really hard to implement distributed lock- and
> > communication- managers but there is no problem with logging two
> > records instead of one. Period.
>
> Great.
>
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
> + If your life is a hard drive, | 830 Blythe Avenue
> + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-01-25 03:53:34 TODO.deail
Previous Message Bruce Momjian 2001-01-25 03:36:39 Re: Re: AW: Re: MySQL and BerkleyDB (fwd)