Re: [HACKERS] redolog - for discussion

From: jwieck(at)debis(dot)com (Jan Wieck)
To: vadim(at)krs(dot)ru (Vadim Mikheev)
Cc: jwieck(at)debis(dot)com, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] redolog - for discussion
Date: 1998-12-16 20:07:00
Message-ID: m0zqNDo-000EBTC@orion.SAPserv.Hamburg.dsh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Vadim wrote:

>
> Jan Wieck wrote:
> >
> > RECOVER DATABASE {ALL | UNTIL 'datetime' | RESET};
> >
> ...
> >
> > For the others, the backend starts the recovery program
> > which reads the redolog files, establishes database
> > connections as required and reruns all the commands in
> ^^^^^^^^^^^^^^^^^^^^^^^^^^
> > them. If a required logfile isn't found, it tells the
> ^^^^^
>
> I foresee problems with using _commands_ logging for
> recovery/replication -:((
>
> Let's consider two concurrent updates in READ COMMITTED mode:
>
> update test set x = 2 where y = 1;
>
> and
>
> update test set x = 3 where y = 1;
>
> The result of both committed transaction will be x = 2
> if the 1st transaction updated row _after_ 2nd transaction
> and x = 3 if the 2nd transaction gets row after 1st one.
> Order of updates is not defined by order in which commands
> begun and so order in which commands should be rerun
> will be unknown...

Yepp, the order in which commands begun is absolutely not of
interest. Locking could already delay the execution of one
command until another one started later has finished and
released the lock. It's a classic race condition.

Thus, my plan was to log the queries just before the call to
CommitTransactionCommand() in tcop. This has the advantage,
that queries which bail out with errors don't get into the
log at all and must not get rerun. And I can set a static
flag to false before starting the command, which is set to
true in the buffer manager when a buffer is written (marked
dirty), so filtering out queries that do no updates at all is
easy.

Unfortunately query level logging get's hit by the current
implementation of sequence numbers. If a query that get's
aborted somewhere in the middle (maybe by a trigger) called
nextval() for rows processed earlier, the sequence number
isn't advanced at recovery time, because the query is
suppressed at all. And sequences aren't locked, so for
concurrently running queries getting numbers from the same
sequence, the results aren't reproduceable. If some
application selects a value resulting from a sequence and
uses that later in another query, how could the redolog know
that this has changed? It's a Const in the query logged, and
all that corrupts the whole thing.

All that is painful and I don't see another solution yet than
to hook into nextval(), log out the numbers generated in
normal operation and getting back the same numbers in redo
mode.

The whole thing gets more and more complicated :-(

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck(at)debis(dot)com (Jan Wieck) #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cd Chen 1998-12-16 21:22:47 Have any ideas to support GNU gettext package ??
Previous Message Bruce Momjian 1998-12-16 18:00:34 Re: [HACKERS] MVCC works in serialized mode!