Re: BUG #6497: Error sent to client, but data written anyway

From: Ryan Lowe <rlowe(at)pablowe(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6497: Error sent to client, but data written anyway
Date: 2012-02-29 17:26:13
Message-ID: CADsh6kvZ4FhDBmeh0A=2RtYgtMtCxNNe9sb0hp=TJMccCX0XHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thanks for all the responses, but I think I'm being unclear here. Let's
walk through this case step-by-step. I start with a happy instance of
Postgres:

<pre>
% pstree -anpcul postgres
postgres,1944 -D /data/pgsql # Main process
├─postgres,1948 # Logger process
├─postgres,1950 # Writer process
├─postgres,1952 # WAL writer process
├─postgres,1953 # Auto vacuum launcher process
├─postgres,1954 # Archiver process
├─postgres,1955 # Stats collector process
├─postgres,1963 # WAL sender process
</pre>

(formatted for readability)

To mimic the failure scenario I am describing, let's start by creating a
table and opening a transaction:

<pre>
postgres=# CREATE TABLE test.t1 (c1 int);
postgres=# BEGIN; INSERT INTO test.t1 (c1) VALUES (1);
</pre>

At this point (in another shell), let's mimic a crash of the main process:

<pre>
% kill -9 1944 # Note this was postgres -D /data/pgsql from the pstree above
<pre>

Now let's see what is running:

<pre>
% kill -9 1944
% pstree -anpcul postgres
postgres,1948 # logger process
postgres,1950 # writer process
postgres,1955 # stats collector process
</pre>

The processes have all been adopted by init (expected). If we go back to
the original psql session after postgres is in the above state and complete
the transaction, this is what happens:

<pre>
postgres=# COMMIT;
The connection to the server was lost. Attempting reset: Failed.
</pre>

Horray. We've received an error because the transaction couldn't be
committed, right? Let's verify:

<pre>
% service postgresql-9.1 start
% psql
postgres=# SELECT * FROM test.t1;
c1
------
1
(1 row)
</pre>

Clearly this is not correct behavior. There has been talk in this thread
that the application should simply always try and validate that its
transactions have in fact failed, but that is not a feasible solution (for
many reasons). Thoughts?

-- Ryan Lowe

On Tue, Feb 28, 2012 at 2:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Huh? If the backend dumped core before you sent it the commit,
> >> the data will certainly not be committed. It might be physically
> >> present on disk, but it won't be considered valid.
>
> > I suppose that there is a window of time between the commit becoming
> > effective and the return to the application which, from a user
> > perspective, would be hard to distinguish from what the OP
> > described. I don't really see how that can be avoided, though,
> > short of a transaction manager using 2PC. Any time the server
> > crashes while a COMMIT is pending, one must check to see whether it
> > "took".
>
> Yeah, a post-commit crash leaves you in doubt about whether the commit
> "took", but you get into byzantine-generals issues as soon as you try to
> design that out. Consider the possibility that the commit went fine but
> the network eats the reply packet saying so. Really, any application
> that is constructed to reconnect and retry has got to have logic to test
> whether the commit occurred, not just assume that it did or didn't.
>
> (2PC is hardly a magic bullet for this, either.)
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message nish2575 2012-02-29 18:11:28 BUG #6498: with recursive / union all
Previous Message Tom Lane 2012-02-29 16:02:56 Re: BUG #6494: Listening to * fails for IP V6