Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.
Date: 2016-04-02 06:12:00
Message-ID: CAD21AoAEpxeYu6z0Pg6=qC12D5bL1dORwXEQMHGek48ShDcsgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 1, 2016 at 9:10 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
>> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
>> >> As a result of looked into code around the recvoery, ISTM that the
>> >> cause is related to relation cache clear.
>> >> In heap_xlog_visible, if the standby server receives WAL record then
>> >> relation cache is eventually cleared in vm_extend, but If standby
>> >> server receives FPI then relation cache would not be cleared.
>> >> For example, after I applied attached patch to HEAD, (it might not be
>> >> right way but) this problem seems to be resolved.
>> >>
>> >> Is this a bug? or not?
>> >
>> > It's a bug. I don't expect it causes queries to return wrong answers, because
>> > visibilitymap.c says "it's always safe to clear a bit in the map from
>> > correctness point of view." (The bug makes a visibility map bit temporarily
>> > appear to have been cleared.) I still call it a bug, because recovery
>> > behavior becomes too difficult to verify when xlog replay produces conditions
>> > that don't happen outside of recovery. Even if there's no way to get a wrong
>> > query answer today, this would be too easy to break later. I wonder if we
>> > make the same omission in other xlog replay functions. Similar omissions may
>> > cause wrong query answers, even if this particular one does not.
>> >
>> > Would you like to bisect for the commit, or at least the major release, at
>> > which the bug first appeared?
>> >
>> > I wonder if your discovery has any relationship to this recently-reported case
>> > of insufficient smgr invalidation:
>> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA(at)mail(dot)gmail(dot)com
>> >
>>
>> I'm not sure this bug has relationship to another issue you mentioned
>> but after further investigation, this bug seems to be reproduced even
>> on more older version.
>> At least I reproduced it at 9.0.0.
>
> Would you try PostgreSQL 9.2.16? The visibility map was not crash safe and
> had no correctness implications until 9.2. If 9.2 behaves this way, it's
> almost certainly not a recent regression.

Yeah, I reproduced it on 9.2.0 and 9.2.16, it's not recent regression.
The commit is 503c7305a1e379f95649eef1a694d0c1dbdc674a which
introduces crash-safe visibility map.

Regards,

--
Masahiko Sawada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-04-02 06:27:29 Re: pgbench - remove unused clientDone parameter
Previous Message Pavel Stehule 2016-04-02 05:18:32 Re: IF (NOT) EXISTS in psql-completion