Quick Links

Re: skink's test_decoding failures in 9.4 branch

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: skink's test_decoding failures in 9.4 branch
Date:	2016-07-20 23:44:07
Message-ID:	11035.1469058247@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> I guess either using valgrind's gdb server on error, or putting some
> asserts checking the size would be best. I can look into it, but it'll
> not be today likely.

I believe the problem is that DecodeUpdate is not on the same page as the
WAL-writing routines about how much data there is for an old_key_tuple.
Specifically, I see this in 9.4's log_heap_update():

if (old_key_tuple)
{
...
xlhdr_idx.t_len = old_key_tuple->t_len;

rdata[nr].data = (char *) old_key_tuple->t_data
+ offsetof(HeapTupleHeaderData, t_bits);
rdata[nr].len = old_key_tuple->t_len
- offsetof(HeapTupleHeaderData, t_bits);
...
}

so that the amount of tuple data that's *actually* in WAL is
offsetof(HeapTupleHeaderData, t_bits) less than what t_len says.
However, over in DecodeUpdate, this is processed with

xl_heap_header_len xlhdr;

memcpy(&xlhdr, data, sizeof(xlhdr));
...
datalen = xlhdr.t_len + SizeOfHeapHeader;
...
DecodeXLogTuple(data, datalen, change->data.tp.oldtuple);

and what DecodeXLogTuple does is

int datalen = len - SizeOfHeapHeader;
(so we're back to datalen == xlhdr.t_len)
...
memcpy(((char *) tuple->tuple.t_data) + offsetof(HeapTupleHeaderData, t_bits),
data + SizeOfHeapHeader,
datalen);

so that we are copying offsetof(HeapTupleHeaderData, t_bits) too much
data from the WAL buffer. Most of the time this doesn't hurt but it's
making valgrind complain, and on a unlucky day we might crash entirely.

I have not looked to see if the bug also exists in > 9.4. Also, it's
not very clear to me whether other call sites for DecodeXLogTuple might
have related bugs.

regards, tom lane

In response to

Re: skink's test_decoding failures in 9.4 branch at 2016-07-20 17:01:59 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Noah Misch	2016-07-21 01:15:46	Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Previous Message	Michael Paquier	2016-07-20 23:42:51	Re: Password identifiers, protocol aging and SCRAM protocol