Re: BUG #6425: Bus error in slot_deform_tuple

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6425: Bus error in slot_deform_tuple
Date: 2012-02-04 16:11:43
Message-ID: CA+U5nMJkaLowf=Vksbh30MBHMQdT2D65fwZfTWF6SQfbT8429A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Feb 3, 2012 at 6:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I have not gotten very far with the coredump, except to observe that
>> gdb says the Assert ought to have passed: ...
>> This suggests very strongly that indeed the buffer was changing under
>> us.
>
> I probably ought to let the test case run overnight before concluding
> anything, but at this point it's run for two-plus hours with no errors
> after applying this patch:
>
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index cce87a3..b128bfd 100644
> *** a/src/backend/access/transam/xlog.c
> --- b/src/backend/access/transam/xlog.c
> *************** RestoreBkpBlocks(XLogRecPtr lsn, XLogRec
> *** 3716,3724 ****
>                }
>                else
>                {
> -                       /* must zero-fill the hole */
> -                       MemSet((char *) page, 0, BLCKSZ);
>                        memcpy((char *) page, blk, bkpb.hole_offset);
>                        memcpy((char *) page + (bkpb.hole_offset + bkpb.hole_length),
>                                   blk + bkpb.hole_offset,
>                                   BLCKSZ - (bkpb.hole_offset + bkpb.hole_length));
> --- 3716,3724 ----
>                }
>                else
>                {
>                        memcpy((char *) page, blk, bkpb.hole_offset);
> +                       /* must zero-fill the hole */
> +                       MemSet((char *) page + bkpb.hole_offset, 0, bkpb.hole_length);
>                        memcpy((char *) page + (bkpb.hole_offset + bkpb.hole_length),
>                                   blk + bkpb.hole_offset,
>                                   BLCKSZ - (bkpb.hole_offset + bkpb.hole_length));
>
>
> The existing code makes the page state transiently invalid (all zeroes)
> for no particularly good reason, and consumes useless cycles to do so,
> so this would be a good change in any case.  The reason it is relevant
> to our current problem is that even though RestoreBkpBlocks faithfully
> takes exclusive lock on the buffer, *that is not enough to guarantee
> that no one else is touching that buffer*.  Another backend that has
> already located a visible tuple on a page is entitled to keep accessing
> that tuple with only a buffer pin.  So the existing code transiently
> wipes the data from underneath the other backend's pin.
>
> It's clear how this explains the symptoms

Yes, that looks like the murder weapon.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Simon Riggs 2012-02-04 18:37:40 Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple
Previous Message Bruce Momjian 2012-02-03 19:32:31 Re: BUG #6347: Reopening bug #6085

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2012-02-04 16:59:42 Re: basic pgbench runs with various performance-related patches
Previous Message Simon Riggs 2012-02-04 16:05:15 Re: Hot standby fails if any backend crashes