Re: Broken hint bits (freeze)

From: Vladimir Borodin <root(at)simply(dot)name>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dmitriy Sarafannikov <dsarafannikov(at)yandex(dot)ru>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Broken hint bits (freeze)
Date: 2017-05-25 06:05:20
Message-ID: D7B95626-BF11-4E7E-AF10-0AB4B5BE9E79@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> 24 мая 2017 г., в 15:44, Robert Haas <robertmhaas(at)gmail(dot)com> написал(а):
>
> On Wed, May 24, 2017 at 7:27 AM, Dmitriy Sarafannikov
> <dsarafannikov(at)yandex(dot)ru> wrote:
>> It seems like replica did not replayed corresponding WAL records.
>> Any thoughts?
>
> heap_xlog_freeze_page() is a pretty simple function. It's not
> impossible that it could have a bug that causes it to incorrectly skip
> records, but it's not clear why that wouldn't affect many other replay
> routines equally, since the pattern of using the return value of
> XLogReadBufferForRedo() to decide what to do is widespread.
>
> Can you prove that other WAL records generated around the same time as
> the freeze record *were* replayed on the master? If so, that proves
> that this isn't just a case of the WAL never reaching the standby.
> Can you look at the segment that contains the relevant freeze record
> with pg_xlogdump? Maybe that record is messed up somehow.

Not yet. Most of such cases are long before our recovery window so corresponding WALs have been deleted. We have already tuned retention policy and we are now looking for a fresh case.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

--
May the force be with you…
https://simply.name

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Ladhe 2017-05-25 06:40:22 Re: Adding support for Default partition in partitioning
Previous Message amul sul 2017-05-25 04:29:00 Re: [POC] hash partitioning