Re: Print physical file path when checksum check fails

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, hzhang(at)pivotal(dot)io, thomas(dot)munro(at)gmail(dot)com, sdn(at)amazon(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Print physical file path when checksum check fails
Date: 2020-02-20 03:36:40
Message-ID: 20200220033640.765tzkvwnc5vwbpb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-02-19 16:48:45 +0900, Michael Paquier wrote:
> On Wed, Feb 19, 2020 at 03:00:54PM +0900, Kyotaro Horiguchi wrote:
> > I have had support requests related to broken block several times, and
> > (I think) most of *them* had hard time to locate the broken block or
> > even broken file. I don't think it is useles at all, but I'm not sure
> > it is worth the additional complexity.
>
> I handle stuff like that from time to time, and any reports usually
> go down to people knowledgeable about PostgreSQL enough to know the
> difference. My point is that in order to know where a broken block is
> physically located on disk, you need to know four things:
> - The block number.
> - The physical location of the relation.
> - The size of the block.
> - The length of a file segment.
> The first two items are printed in the error message, and you can
> guess easily the actual location (file, offset) with the two others.

> I am not necessarily against improving the error message here, but
> FWIW I think that we need to consider seriously if the code
> complications and the maintenance cost involved are really worth
> saving from one simple calculation.

I don't think it's that simple for most.

And if we e.g. ever get the undo stuff merged, it'd get more
complicated, because they segment entirely differently. Similar, if we
ever manage to move SLRUs into the buffer pool and checksummed, it'd
again work differently.

Nor is it architecturally appealing to handle checksums in multiple
places above the smgr layer: For one, that requires multiple places to
compute verify them. But also, as the way checksums are computed depends
on the page format etc, it'll likely change for things like undo/slru -
which then again will require additional smarts if done above the smgr
layer.

> Particularly, quickly reading through the patch, I am rather unhappy
> about the shape of the second patch which pushes down the segment
> number knowledge into relpath.c, and creates more complication around
> the handling of zero_damaged_pages and zero'ed pages. -- Michael

I do not like the SetZeroDamagedPageInChecksum stuff at all however.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-02-20 03:41:35 Re: backend type in log_line_prefix?
Previous Message Michael Paquier 2020-02-20 03:23:26 Re: Clean up some old cruft related to Windows