Re: New WAL code dumps core trivially on replay of bad data

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Amit kapila <amit(dot)kapila(at)huawei(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: New WAL code dumps core trivially on replay of bad data
Date: 2012-08-20 15:25:40
Message-ID: 711.1345476340@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 20.08.2012 17:04, Tom Lane wrote:
>> Uh, no, you misread it. xl_tot_len is *zero* in this example. The
>> problem is that RecordIsValid believes xl_len (and backup block size)
>> even when it exceeds xl_tot_len.

> Ah yes, I see that now. I think all we need then is a check for
> xl_tot_len >= SizeOfXLogRecord.

That should get us back to a reliability level similar to the old code.

However, I think that we also need to improve RecordIsValid so that at
each step, it checks it hasn't overrun xl_tot_len *before* touching the
corresponding part of the record buffer.

> I was thinking that we might read gigabytes worth of bogus WAL into the
> memory buffer, if xl_tot_len is bogus and large, e.g 0xffffffff. But now
> that I look closer, the xlog record is validated after reading the first
> continuation page, so we should catch a bogus xl_tot_len value at that
> point. And there is a cross-check with xl_rem_len on every continuation
> page, too.

Yeah. Even if xl_tot_len is bogus, we should realize that within a
couple of pages at most. The core of the problem here is that
RecordIsValid is not being careful to confine its touches to the
guaranteed-to-exist bytes of the record buffer, ie 0 .. xl_tot_len-1.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-08-20 15:32:33 Re: SP-GiST for ranges based on 2d-mapping and quad-tree
Previous Message Heikki Linnakangas 2012-08-20 15:09:19 Re: New WAL code dumps core trivially on replay of bad data