Re: Change pg_last_xlog_receive_location not to move backwards

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Change pg_last_xlog_receive_location not to move backwards
Date: 2011-01-30 01:44:34
Message-ID: AANLkTinTWw8uGtVxSy65EFaR=3eypL-JfSZn+QeFit-v@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 25, 2011 at 6:38 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

>> The third seems more problematic.  In the XLogPageRead,
>> it checks to see if more records have been received beyond what
>> has been applied.  By using the non-retreating value here, it seems
>> like the xlog replay could start replaying records that the wal
>> receiver is in the process of overwriting.  Now, I've argued to myself
>> that this is not a problem, because the receiver is overwriting them
>> with identical data to that which is already there.
>
> Yes. I don't think that it's a problem, too.
>
>> But by that logic, why does any part of it (walrcv->receiveStart in
>> the patch, walrcv->receivedUpto unpatched) need to retreat?  From
>> the previous discussion, I understand that the concern is that we don't
>> want to retrieve and write out partial xlog files.  What I don't understand
>> is how we could get our selves into the position in which we are doing
>> that, other than by someone going in and doing impermissible things to
>> the PGDATA directory behind our backs.
>
> That logic exists because we'd like to check that newly-received WAL
> data is consistent with previous one by validating the header of new
> WAL file.

I do not understand what doing so gets us.

Say we previously received 2/3 of a WAL file, and replayed most of it.
So now the shared buffers have data that has been synced to that WAL
file already, and some of those dirty shared buffers have been written
to disk and some have not. At this point, we need the data in the first
2/3 of the WAL file in order to reach a consistent state. But now we
lose the connection to the master, and then we restore it. Now we
request the entire file from the start rather than from where it
left off.

Either of two things happens. Most likely, the newly received WAL file
matches the file it is overwriting, in which case there was no
point in asking for it.

Less likely, the master is feeding us gibberish. By requesting the
full WAL file, we check the header and detect that the master is feeding
us gibberish. Unfortunately, we only detect that fact *after* we have
replaced a critical part of our own (previously good) copy of the WAL
file with said gibberish. The standby is now in an unrecoverable state.

With a bit of malicious engineering, I have created this situation.
I don't know how likely it is that something like that could happen
accidentally, say with a corrupted file system. I have been unable
to engineer a situation where checking the header actually does
any good. It has either done nothing, or done harm.

> So since we need the header of new WAL file, we retreat the
> replication starting location to the beginning of the WAL file when
> reconnecting to the primary.
>
> The following code (in XLogPageRead) validates the header of new
> WAL file.
>
> ----------------------
>        if (switched_segment && targetPageOff != 0)
>        {
>                /*
>                 * Whenever switching to a new WAL segment, we read the first page of
>                 * the file and validate its header, even if that's not where the
>                 * target record is.  This is so that we can check the additional
>                 * identification info that is present in the first page's "long"
>                 * header.
>                 */
>                readOff = 0;
>                if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ)
>                {
>                        ereport(emode_for_corrupt_record(emode, *RecPtr),
>                                        (errcode_for_file_access(),
>                                         errmsg("could not read from log file %u, segment %u, offset %u: %m",
>                                                        readId, readSeg, readOff)));
>                        goto next_record_is_invalid;
>                }
>                if (!ValidXLOGHeader((XLogPageHeader) readBuf, emode))
>                        goto next_record_is_invalid;
>        }

OK, thanks for the explanation. Is there a race condition here? That is,
it seems that with your patch this part of the code could get executed
after streaming is restarted, but before the streaming ever actually received
and wrote anything. So it would be checking the old header, not the newly
about-to-be received header.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2011-01-30 02:55:19 Re: WIP: RangeTypes
Previous Message Andrew Dunstan 2011-01-29 22:03:42 Re: [Mingw-users] mingw64