Re: Change pg_last_xlog_receive_location not to move backwards

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Change pg_last_xlog_receive_location not to move backwards
Date: 2011-01-31 07:12:21
Message-ID: AANLkTinrwPqu8q1oJ9muc+Z0o4qBJhc1RQXD=xZjDdxF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 30, 2011 at 10:44 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> I do not understand what doing so gets us.
>
> Say we previously received 2/3 of a WAL file, and replayed most of it.
> So now the shared buffers have data that has been synced to that WAL
> file already, and some of those dirty shared buffers have been written
> to disk and some have not.  At this point, we need the data in the first
> 2/3 of the WAL file in order to reach a consistent state.  But now we
> lose the connection to the master, and then we restore it.  Now we
> request the entire file from the start rather than from where it
> left off.
>
> Either of two things happens.  Most likely, the newly received WAL file
> matches the file it is overwriting, in which case there was no
> point in asking for it.
>
> Less likely, the master is feeding us gibberish.  By requesting the
> full WAL file, we check the header and detect that the master is feeding
> us gibberish.  Unfortunately, we only detect that fact *after* we have
> replaced a critical part of our own (previously good) copy of the WAL
> file with said gibberish.  The standby is now in an unrecoverable state.

Right. To avoid this problem completely, IMO, walreceiver should validate
the received WAL data before writing it. Or, walreceiver should write the
WAL to the transient file, and the startup process should rename it to the
correct name after replaying it.

We should do something like the above?

> With a bit of malicious engineering, I have created this situation.
> I don't know how likely it is that something like that could happen
> accidentally, say with a corrupted file system.  I have been unable
> to engineer a situation where checking the header actually does
> any good.  It has either done nothing, or done harm.

OK, I seem to have to consider again why the code which retreats the
replication starting location exists.

At first, I added the code to prevent a half-baked WAL file. For example,
please imagine the case where you start the standby server with no WAL
files in pg_xlog. In this case, if replication starts from the middle of WAL
file, the received WAL file is obviously broken (i.e., with no WAL data in
the first half of file). This broken WAL file might cause trouble when we
restart the standby and even when we just promote it (because the last
applied WAL file is re-fetched at the end of recovery).

OTOH, we can start replication from the middle of WAL file if we can
*ensure* that the first half of WAL file already exists. At least, when the
standby reconnects to the master, we might be able to ensure that and
start from the middle.

> OK, thanks for the explanation.  Is there a race condition here?  That is,
> it seems that with your patch this part of the code could get executed
> after streaming is restarted, but before the streaming ever actually received
> and wrote anything.  So it would be checking the old header, not the newly
> about-to-be received header.

As far as I read the code again, there is no such a race condition. When
the received WAL is read just after streaming is restarted, that part of the
code seems to always be executed.

Maybe I'm missing something. Could you explain about the problematic
scenario?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2011-01-31 08:04:22 Re: Spread checkpoint sync
Previous Message Itagaki Takahiro 2011-01-31 06:49:07 Re: multiset patch review