Re: 7.4.5 losing committed transactions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 7.4.5 losing committed transactions
Date: 2004-09-24 22:37:05
Message-ID: 5883.1096065425@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> Is it somehow possible that the commit record was still sitting in the
> shared WAL buffers (unwritten) when the response got sent to the client?

I don't think so. What I see in the two cases I have now are:

(1) The backend that was doing the "lost" transaction is *not* the one
I kill -9'd. I know this in both cases because I know which table has
the missing entries, and I can see that that instance of the script got
a "WARNING: terminating connection because of crash of another server
process" message rather than just a connection closure.

(2) There's a pretty fair distance in the WAL log between the entries
made by the "lost" transaction and the checkpoint made by recovery ---
a dozen or so other transactions were made and committed in between.
It seems unlikely that this transaction would have been the only one to
lose a WAL record if something like that had happened.

What I'm currently speculating about is that there might be some
weirdness associated with the very act of sending out the WARNING.
quickdie() isn't doing anything to ensure that the system is in a good
state before it calls ereport --- which is probably not so cool
considering it is a signal handler. It might be wise to reset at least
the elog.c state before doing this.

Can you still reproduce the problem if you take out the ereport call
in quickdie()?

BTW, what led you to develop this test setup ... had you already seen
something that made you suspect a data loss problem?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cott Lang 2004-09-24 23:30:49 implosion follow up, 7.4.5
Previous Message Jan Wieck 2004-09-24 22:17:35 Re: 7.4.5 losing committed transactions